Browser performance artifact Report artifact 2026-05-31

Browser WebGPU Training Speedup

This is the public proof that the browser playground was not just a demo. The same GPT-2-shaped training path runs through a benchmark harness and reports measured end-to-end speedups.

WebGPU
WASM
browser
performance

Headline Numbers

WebGPU speedup

12.1x vs WASM SIMD at d_model=256

Small-width speedup

2.6x d_model=96

Browser track

shipped WASM, SIMD, OPFS, WebGPU fast path

Competitive Context

System	Metric	Score	Size / Class	Comparable?	Readout
TinyGPT WebGPU	training step speedup	12.1x	d_model=256 browser run	Direct	Directly measured against the repo's WASM SIMD path.
TinyGPT WASM SIMD	training step speedup	1.0x	same browser model/config	Direct	Portable CPU baseline and fallback path.
Native Mac runtimes	browser training benchmark	not measured	MLX/Metal class	Not comparable	Native runtimes are the right competition for production throughput, but not for the browser-learning artifact.

Direct rows share this artifact's eval setup. Directional rows are useful market context but should not be read as leaderboard claims.

Performance readout

Variant	Result	Interpretation
WASM SIMD	baseline	Portable CPU path
WebGPU d_model=96	2.6x	GPU overhead still visible
WebGPU d_model=256	12.1x	GPU dominates as width grows

Release Blockers

Not active factory center

The browser track is valuable, but current active work is the Mac-local specialist factory.

Unblock: Use browser pages to present factory reports instead of expanding playground scope.

Evidence

Next Release Action

Keep as a public performance artifact and cross-link it from factory reports when browser-local training matters.