WebGPU speedup
12.1x vs WASM SIMD at d_model=256Browser WebGPU Training Speedup
This is the public proof that the browser playground was not just a demo. The same GPT-2-shaped training path runs through a benchmark harness and reports measured end-to-end speedups.
Headline Numbers
Small-width speedup
2.6x d_model=96Browser track
shipped WASM, SIMD, OPFS, WebGPU fast pathCompetitive Context
| System | Metric | Score | Size / Class | Comparable? | Readout |
|---|---|---|---|---|---|
| TinyGPT WebGPU | training step speedup | 12.1x | d_model=256 browser run | Direct | Directly measured against the repo's WASM SIMD path. |
| TinyGPT WASM SIMD | training step speedup | 1.0x | same browser model/config | Direct | Portable CPU baseline and fallback path. |
| Native Mac runtimes | browser training benchmark | not measured | MLX/Metal class | Not comparable | Native runtimes are the right competition for production throughput, but not for the browser-learning artifact. |
Direct rows share this artifact's eval setup. Directional rows are useful market context but should not be read as leaderboard claims.
Performance readout
| Variant | Result | Interpretation |
|---|---|---|
| WASM SIMD | baseline | Portable CPU path |
| WebGPU d_model=96 | 2.6x | GPU overhead still visible |
| WebGPU d_model=256 | 12.1x | GPU dominates as width grows |
Release Blockers
Not active factory center
The browser track is valuable, but current active work is the Mac-local specialist factory.
Unblock: Use browser pages to present factory reports instead of expanding playground scope.