SQL factory POC Current-best candidate 2026-07-02

Qwen3-0.6B Routed SQL Specialist

This is the cleanest current proof of the factory thesis: a tiny 0.6B model can beat a small public SQL baseline on a frozen exact-match slice, but only the routed artifact survives both public and execution-style gates.

SQL
routing
LoRA
evals

Headline Numbers

Public exact

0.531 64-row b-mc2/sql-create-context slice

T5-small baseline

0.484 same 64 public rows

Synthetic execution

0.860 50 heldout SQLite rows

Synthetic exact

0.840 same 50 heldout rows

Competitive Context

System	Metric	Score	Size / Class	Comparable?	Readout
TinyGPT routed SQL v1	b-mc2 exact / synthetic exec	0.531 / 0.860	0.6B base + 2 routed LoRAs	Direct	Current local candidate; public exact and synthetic execution gates are both frozen.
T5-small local baseline	b-mc2 exact	0.484	~60M	Direct	Same 64-row public slice; TinyGPT is +4.7 points exact on this narrow gate.
Defog SQLCoder-7B-2	Defog SQL-Eval category scores	77.1-96%	7B	Directional	Strong public SQL specialist, but reported as category-level Defog SQL-Eval scores rather than this b-mc2 slice.
Snowflake Arctic-Text2SQL-R1-7B	BIRD execution accuracy	68.47%	7B	Not comparable	Useful target class for public SQL execution; TinyGPT must add a BIRD/Spider execution gate before claiming this lane.
Snowflake Arctic-Text2SQL-R1-14B / 32B	BIRD execution accuracy	70.04% / 71.83%	14B / 32B	Not comparable	Shows the current public high bar: execution accuracy, not exact string match.

Direct rows share this artifact's eval setup. Directional rows are useful market context but should not be read as leaderboard claims.

Adapter comparison

Setup	Public exact	Synthetic exec	Decision
Public v4 only	0.531	0.240	Route required
Blend v1	0.297	0.560	Reject
Best static composition	0.516	0.460	Reject
BIRD + b-mc2 v5	0.438	0.280	Reject
Classifier-routed v1	0.531	0.860	Current best

Router verification

Check	Result	Evidence
Unlabeled mixed rows	114	64 public / 50 synthetic
Public route reason	64	known_public_source
Synthetic route reason	50	sqlite_db_field
Route confidence	>= 0.99	all smoke rows

Release Blockers

Public execution benchmark missing

b-mc2 exact match is useful, but serious SQL claims need execution accuracy on public DBs.

Unblock: Add BIRD Mini-Dev SQLite or Spider SQLite execution fixtures once the DB bundle is local.

Output hygiene

The scorer extracts the first SELECT; completions can still include prose after the query.

Unblock: Add clean-SQL metric plus stopping/format preference data.

Not a specialist package yet

The adapters currently live under gitignored run folders, not package metadata.

Unblock: Package under specialists/ only after a ship decision on a public execution gate.

Evidence

Next Release Action

Publish this as a report artifact first. Do not present it as a shipped SQL model until public execution eval and clean-output gates pass.