Run an LLM in your browser

Fully client-side inference via WebGPU (transformers.js). Pick a model or pull any Hugging Face repo that ships an onnx/ folder. Weights download to your machine and run locally — no server, no upload. First load downloads the model (cached after).

Model

Custom Hugging Face model

Must be a repo with an onnx/ folder (transformers.js format). Browse transformers.js models. Big models (>~2 GB) may exhaust browser memory.

Prompt

Temperature

Max new tokens

How it works: transformers.js loads ONNX weights from the Hugging Face hub and runs them through the ONNX Runtime Web WebGPU backend, entirely in this tab. The repo's own hand-written WebGPU kernels (the playground) run the tiny from-scratch models; this page runs real pretrained LLMs.