Post

Introducing Phi-3 WebGPU

Phi-3 WebGPU: Private AI Chatbot Running Locally in Your Browser

Curiosity: How can we run powerful AI models entirely in the browser? What happens when we combine WebGPU acceleration with on-device inference?

Phi-3 WebGPU is a private and powerful AI chatbot that runs 100% locally in your browser, powered by ๐Ÿค— Transformers.js and onnxruntime-web. No data is sent to serversโ€”everything runs on your device.

Try it: https://huggingface.co/spaces/Xenova/experimental-phi3-webgpu

Created by: Xenova

Key Features

Retrieve: Phi-3 WebGPUโ€™s impressive capabilities.

FeatureDescriptionBenefit
๐Ÿ”’ PrivacyOn-device inferenceโฌ†๏ธ No data sent to servers
โšก๏ธ PerformanceWebGPU-accelerated (>20 t/s)โฌ†๏ธ Fast inference
๐Ÿ“ฅ EfficiencyModel cached after downloadโฌ†๏ธ One-time download
๐Ÿš€ SpeedUp to 42 tokens/secondโฌ†๏ธ Real-time responses

Performance: Phi-3 running at 42 tokens per second 100% locally in your browser!

Architecture

Innovate: How browser-based AI works.

graph TB
    A[Browser] --> B[WebGPU]
    B --> C[Transformers.js]
    C --> D[ONNX Runtime]
    D --> E[Phi-3 Model]
    E --> F[Local Inference]
    F --> G[Response]
    
    H[Model Cache] --> E
    
    style A fill:#e1f5ff
    style B fill:#fff3cd
    style G fill:#d4edda

Technology Stack

Retrieve: Components enabling browser-based AI.

Technologies:

  • ๐Ÿค— Transformers.js: JavaScript port of Transformers
  • onnxruntime-web: ONNX runtime for web
  • WebGPU: GPU acceleration in browser
  • Phi-3: Microsoftโ€™s efficient language model

Benefits:

  • โœ… No server required
  • โœ… Complete privacy
  • โœ… Fast inference
  • โœ… Works offline

Use Cases

Innovate: Applications of browser-based AI.

Ideal For:

  • Privacy-sensitive applications
  • Offline AI capabilities
  • Client-side processing
  • Educational demos
  • Personal AI assistants

Advantages:

  • โœ… No API costs
  • โœ… No data transmission
  • โœ… Works offline
  • โœ… Low latency

Key Takeaways

Retrieve: Phi-3 WebGPU demonstrates that powerful AI models can run entirely in the browser using WebGPU acceleration, providing privacy and performance without server dependencies.

Innovate: By leveraging WebGPU, Transformers.js, and efficient models like Phi-3, you can build private, fast AI applications that run locally in browsers, enabling new use cases for client-side AI.

Curiosity โ†’ Retrieve โ†’ Innovation: Start with curiosity about browser-based AI, retrieve insights from Phi-3 WebGPUโ€™s approach, and innovate by building private, on-device AI applications that respect user privacy.

Next Steps:

  • Try the demo
  • Explore Transformers.js
  • Learn WebGPU
  • Build browser-based AI
This post is licensed under CC BY 4.0 by the author.