← TinyGPT · docs · devlog · roadmap · speedup
source: docs/status.md · view on GitHub ↗

Project status — 2026 update

A review-oriented snapshot of where TinyGPT stands. The detailed docs are linked at the bottom; this page is the map.

TinyGPT is finished as a teaching project and continuing as a performance project. The original ten milestones (PyTorch ref, training, LoRA, WASM backend, browser app, WebGPU matmul, checkpointing, metrics dashboard, write-up, public repo) are all complete and on main. The work past that point is the perf round-trip — kernels, parity tests, the speedup curve, and the lessons each failed lever taught.

What’s measured and shipped beyond the original milestones

AreaWhatState
PerfWASM SIMD (-msimd128) — measured 1.6×shipped
PerfMulti-threaded WASM (pthreads + SAB) — measured ~2×shipped
PerfWebGPU full stack (blocked4 + vec4 + subgroups + FA2 fwd+bwd)shipped
PerfEnd-to-end curve vs multi-thread WASM SIMD: Small 2.6×, Medium 6.8×, Large 9.3×, XL 12.1×measured
CapacityMemory64 module (tinygpt64.{js,wasm}) — 473M params in Node, browser blocked at d_model ≥ 256 (task #66)partial
DataDefault corpus switched from inline 863-byte paragraph to TinyShakespeare (1.1 MB, /shakespeare.txt)shipped
DataHugging Face dataset loading via public datasets-server APIshipped
ConfigDefault LR fixed: was 3e-3 (10× the Python ref), now 3e-4 — see lessons.mdshipped
UXBanner reworked to make “load pretrained / train from scratch (~15 min)” explicitshipped
UXPretrained Shakespeare demo model (Huge preset, 5000 steps) replaces browser/public/demo.tinygptshipped
SiteAstro migration — 5 static routes built into dist/shipped

What’s verified, and how

SuiteCoversResult
tests/test_phase1.pymodel, training, sampling8/8
tests/test_lora.pyLoRA6/6
wasm/build_native.shC++ kernels (finite-diff) + C++ model overfitpass
tests/smoke_wasm_node.mjscompiled WASM module trainspass
browser/npm run webgpu-test24 WebGPU kernel parity checks + GPU overfitpass
tests/test_webgpu_train.mjs50-step WASM vs WebGPU end-to-end paritypass (drift 1.1–2.5%)
tests/test_fa2_parity.mjs + test_fa2_backward_parity.mjsFA2 fwd + bwd vs naive refpass (≤ 1 ULP)
tests/test_wasm64_xl_node.mjsreproduces the Memory64 ABI bugreproduces (task #66)
browser/npm run e2efull app in headless browserpass

Everything that can be checked by a machine, is — that was the method throughout.

Open — worth your attention

Where the docs are