Single-machine roadmap — index

A complete inventory of techniques that run on one Mac (or in a browser tab), ROI-ranked for TinyGPT. The original 1,400-line master plan is split across the files below — each is short enough to read in one sitting.

How to read this

Filter — single-machine only. Anything requiring a GPU cluster (ZeRO/FSDP, tensor parallelism, large RLHF runs) is in tier4_skip.md.

Three views of the same landscape:

Tiers 1-4 are the ROI ranking of training-or-product-shaping techniques (what to build next from the 2017-2024 toolbox). Higher tier = better ROI for us.
Tier 5 (tier5_frontier_2026.md) is the 2026 research frontier — speculative items where the outcome is a research artifact + reproducible scaling curve, not a polished feature. The project pauses training-at-2024-fundamentals until these land.
The category sections (categories.md) are an exhaustive taxonomy of everything else — optimizers, data, interpretability, browser perf, etc. — orthogonal to the main pipeline.

Status legend (used throughout):

🟢 shipped · 🟡 partial · ⬜ not yet built · 🟣 considered + parked

Markers last verified against the codebase: 2026-05-30 (post-CPU-bundle merge). Ship count: Tier 1: 8/9 · Tier 2: 13/16 · Tier 3: 17/21 · Tier 5: 0/5 (frontier research).

For current external benchmark landscape, see docs/research/inference_benchmarks_may_2026.md and docs/research/quality_benchmarks_may_2026.md.

The files

File	What it covers
`tier1.md`	High ROI — build next. Distillation, sequence packing, QLoRA, ORPO, SimPO, NEFTune, gradient checkpointing, speculative decoding, browser benchmark runner.
`tier2.md`	Medium ROI. KTO, IPO, DoRA, GaLore, VeRA, LoftQ, AWQ/GPTQ readers, HQQ, sliding window, ALiBi, KIVI, MTP, MQA, attention sink, prefix caching, prefix tuning.
`tier3.md`	Niche / specialized. RLAIF, GPTQ from-scratch, pruning, LASER, RLHF, Medusa, MoE, differential attention, MoD, LoRA variants (LoRA+, AdaLoRA, ReLoRA, PISSA, etc.).
`tier4_skip.md`	Skip — not worth it for us. fp16, ZeRO at single device, SSMs, etc.
`tier5_frontier_2026.md`	2026 research frontier. Reasoning training, test-time compute scaling, vision-language toy, diffusion LM micro-impl, real sparse MoE kernels. Plus cloud pipeline + pausable training as enabler infrastructure.
`categories.md`	Orthogonal categories — optimizers, training stability, data, tokenization, interpretability, inference, browser perf, architecture, PEFT taxonomy, infra.
`recommended_order.md`	The top-10 order to build in.
`datasets.md`	Open-source datasets we’d actually use (pretrain / SFT / DPO / code / math / eval).
`recent_research.md`	2024-2026 highlights with arxiv links (alignment, PEFT, quantization, inference, optimizers, distillation, reasoning RL).
`phased_plan.md`	The executable 10-phase plan, 7 weeks of sequenced work.
`blockers.md`	What we can’t build right now and why (hardware, library, budget, integration). Includes the Phase 9/10 detailed status appendix.
`honest_summary.md`	The “what we can / can’t / shouldn’t build” summary, plus cross-references.

One-line answer to “what do I build first?”

Phase 1 of phased_plan.md: NEFTune + gradient clipping

LoRA+ + persistent tokenized cache + the browser-side benchmark runner. ~3 days, all small wins.