ANE decode
17 tok/s Qwen3 28-block layer-chunked chainANE M8 Core ML Chain
This artifact maps the boundary between owning the model and using Apple’s acceleration stack. It is parked because capability comes from our model/eval loop; Core ML is a deployment target, not the product center.
Headline Numbers
FoundationModels context
4096 too small for real tool catalogsAction grounding
25% FoundationModels BFCL agentic full catalogCompetitive Context
| System | Metric | Score | Size / Class | Comparable? | Readout |
|---|---|---|---|---|---|
| TinyGPT Qwen3 Core ML chain | ANE decode | 17 tok/s | 28-block Qwen3 path | Direct | Runtime experiment only; capability still comes from TinyGPT weights and evals. |
| Apple FoundationModels | action grounding / context | 25% / 4096 tokens | Apple on-device model | Directional | Useful as a free local floor, but too weak to be the specialist capability dependency. |
| TinyGPT active MLX path | specialist eval readiness | active | owned weights | Directional | Preferred competition lane: model quality first, runtime optimization second. |
Direct rows share this artifact's eval setup. Directional rows are useful market context but should not be read as leaderboard claims.
Platform stance
| Path | Decision | Reason |
|---|---|---|
| Apple FoundationModels | Routing floor only | Weak action grounding and short context |
| Our weights -> Core ML | Optional future | Battery/perf optimization if capability is already solved |
| MLX/TinyGPT runtime | Active | Own the model and eval gate |
Release Blockers
Capability dependency rejected
Apple's model cannot be the differentiator; it is a free local floor at best.
Unblock: Only revive Core ML when a shipped TinyGPT specialist needs a battery/runtime optimization.