Public exact
0.531 64-row b-mc2/sql-create-context sliceQwen3-0.6B Routed SQL Specialist
This is the cleanest current proof of the factory thesis: a tiny 0.6B model can beat a small public SQL baseline on a frozen exact-match slice, but only the routed artifact survives both public and execution-style gates.
Headline Numbers
T5-small baseline
0.484 same 64 public rowsSynthetic execution
0.860 50 heldout SQLite rowsSynthetic exact
0.840 same 50 heldout rowsCompetitive Context
| System | Metric | Score | Size / Class | Comparable? | Readout |
|---|---|---|---|---|---|
| TinyGPT routed SQL v1 | b-mc2 exact / synthetic exec | 0.531 / 0.860 | 0.6B base + 2 routed LoRAs | Direct | Current local candidate; public exact and synthetic execution gates are both frozen. |
| T5-small local baseline | b-mc2 exact | 0.484 | ~60M | Direct | Same 64-row public slice; TinyGPT is +4.7 points exact on this narrow gate. |
| Defog SQLCoder-7B-2 | Defog SQL-Eval category scores | 77.1-96% | 7B | Directional | Strong public SQL specialist, but reported as category-level Defog SQL-Eval scores rather than this b-mc2 slice. |
| Snowflake Arctic-Text2SQL-R1-7B | BIRD execution accuracy | 68.47% | 7B | Not comparable | Useful target class for public SQL execution; TinyGPT must add a BIRD/Spider execution gate before claiming this lane. |
| Snowflake Arctic-Text2SQL-R1-14B / 32B | BIRD execution accuracy | 70.04% / 71.83% | 14B / 32B | Not comparable | Shows the current public high bar: execution accuracy, not exact string match. |
Direct rows share this artifact's eval setup. Directional rows are useful market context but should not be read as leaderboard claims.
Adapter comparison
| Setup | Public exact | Synthetic exec | Decision |
|---|---|---|---|
| Public v4 only | 0.531 | 0.240 | Route required |
| Blend v1 | 0.297 | 0.560 | Reject |
| Best static composition | 0.516 | 0.460 | Reject |
| BIRD + b-mc2 v5 | 0.438 | 0.280 | Reject |
| Classifier-routed v1 | 0.531 | 0.860 | Current best |
Router verification
| Check | Result | Evidence |
|---|---|---|
| Unlabeled mixed rows | 114 | 64 public / 50 synthetic |
| Public route reason | 64 | known_public_source |
| Synthetic route reason | 50 | sqlite_db_field |
| Route confidence | >= 0.99 | all smoke rows |
Release Blockers
Public execution benchmark missing
b-mc2 exact match is useful, but serious SQL claims need execution accuracy on public DBs.
Unblock: Add BIRD Mini-Dev SQLite or Spider SQLite execution fixtures once the DB bundle is local.
Output hygiene
The scorer extracts the first SELECT; completions can still include prose after the query.
Unblock: Add clean-SQL metric plus stopping/format preference data.
Not a specialist package yet
The adapters currently live under gitignored run folders, not package metadata.
Unblock: Package under specialists/ only after a ship decision on a public execution gate.