Train a small GPT — the kind of model behind ChatGPT, only ~0.8 million parameters instead of a trillion — from scratch, right here in this tab. No server, no install. Watch the loss curve fall, then ask it to write you a sentence. Source ↗
You can sample while training is still running — this is the trick. The model is a single instance in a Web Worker; Generate runs a forward pass against the same weights training is mutating, between steps. Sample twice during a run to watch what the model has picked up.
Click any byte below to see what the model considered at that
position — the top-10 next-byte probabilities the
last forward pass produced, and the per-head attention
weights from the final transformer block (which earlier
bytes each head was looking at).
This is honest: it's the actual distribution that token was sampled
from, and the actual softmax weights from the last attention layer.
Nothing prettified. · marks bytes that are whitespace
or control characters.
computing…
Runs the same matmul through the WebGPU compute kernel and the WASM kernel, parity-checks they agree, and reports the per-shape speedup on your hardware. Needs Chrome / Edge 113+. For the full end-to-end speedup curve across the preset table, see /speedup.