Fully client-side inference via WebGPU (transformers.js).
Pick a model or pull any Hugging Face repo that ships an onnx/
folder. Weights download to your machine and run locally — no server, no
upload. First load downloads the model (cached after).
Must be a repo with an onnx/ folder (transformers.js format).
Browse transformers.js models.
Big models (>~2 GB) may exhaust browser memory.
How it works: transformers.js loads ONNX weights from the Hugging Face hub and runs them through the ONNX Runtime Web WebGPU backend, entirely in this tab. The repo's own hand-written WebGPU kernels (the playground) run the tiny from-scratch models; this page runs real pretrained LLMs.