Our AI agent cost $87 in its first month. The fix wasn’t using it less — it was routing each task to the right model. Three weeks later, same workload, $27/month.
Model routing is the single highest-impact optimization you can make to an AI agent setup. It cut our costs more than caching, context trimming, and QMD memory search combined. Here’s the exact config and the rules that make it work.
After running OpenClaw for a production SEO operation handling 500+ articles, content audits, and daily automation — these are the numbers from our actual OpenRouter dashboard.
What Model Routing Actually Means
Most AI agent setups use one model for everything. OpenClaw’s default config uses openrouter/auto, which picks a model based on availability and capability — not cost.
That means a simple heartbeat check (“are you alive?”) might hit Claude Opus at $15 per million tokens. A CSV file rename might burn Sonnet tokens at $3/1M. Meanwhile, Gemini Flash handles both tasks perfectly at $0.10/1M.
THE COST GAP
150x
Price difference between Opus ($15/1M) and Flash ($0.10/1M) for the same simple task
Model routing fixes this. You define tiers — cheap models for cheap tasks, expensive models only where quality demands it. The AI doesn’t choose. The config does.
Key point: Routing isn’t about using worse models. It’s about not using a $15/1M model for tasks a $0.10/1M model handles identically.
Our 5-Tier Routing Setup
Here’s the exact model hierarchy we run in production:
| Tier | Model | Alias | $/1M Tokens | % of Tasks | Monthly Cost |
|---|---|---|---|---|---|
| Budget | Gemini 2.0 Flash | fast | $0.10 | ~55% | ~$5.50 |
| Worker | Kimi K2.5 | kimi | $0.60 | ~15% | ~$5.00 |
| Writer | Claude Sonnet 4.5 | sonnet | $3.00 | ~25% | ~$14.00 |
| Frontier | Claude Opus 4.6 | opus | $15.00 | ~5% | ~$3.00 |
| Research | Perplexity Sonar Pro | sonar | Variable | As needed | ~$0 (rare) |
The math works because 75% of all tasks run on models costing under $1/1M tokens. The expensive models (Sonnet, Opus) only touch tasks where their quality genuinely matters.
The Config That Makes It Happen
OPENCLAW.JSON — MODEL ROUTING
{
"agents": {
"defaults": {
"model": "google/gemini-2.0-flash-001",
"models": {
"fast": {
"id": "google/gemini-2.0-flash-001",
"aliases": ["flash", "budget", "free"],
"description": "Heartbeats, classification, data extraction"
},
"kimi": {
"id": "moonshot/kimi-k2.5",
"aliases": ["worker", "deepseek"],
"description": "SEO analysis, agentic browsing"
},
"sonnet": {
"id": "anthropic/claude-sonnet-4-5-20250514",
"aliases": ["writer", "claude"],
"description": "ALL content writing. Non-negotiable."
},
"opus": {
"id": "anthropic/claude-opus-4-6",
"aliases": ["frontier"],
"description": "ONLY via explicit /model opus"
}
},
"fallbacks": [
"google/gemini-2.0-flash-001",
"moonshot/kimi-k2.5",
"anthropic/claude-sonnet-4-5-20250514"
]
}
}
}Three things to notice:
- Default is Flash. Every task starts on the cheapest model. You opt up, not down.
- Aliases make switching easy. Type
/model sonnetbefore a writing task. Type/model fastto go back. - Fallback chain goes cheap → mid → expensive. If Flash is down, Kimi takes over. Sonnet is the last resort, never Opus.
💡 Pro Tip
The fallback chain matters more than you’d think. Without it, a Flash outage silently routes everything to whatever OpenRouter picks — often Opus. With the chain, you control the escalation path: Flash → Kimi → Sonnet. Never Opus in the fallback.
The 8 Routing Rules We Follow
These rules are baked into our agent’s workspace instructions. They run on every session:
Rule 1: Heartbeats Always Use Flash
The heartbeat runs 24/7. On Opus, that’s $15-30/month for zero productive output. On Flash, it’s $1.50/month. This is configured in openclaw.json — the agent can’t override it.
Rule 2: Sub-Agents Default to Flash
When your main agent spawns a sub-agent for a task (web scraping, file processing, data extraction), that sub-agent inherits the default model. Flash. Only escalate if the sub-task genuinely needs reasoning.
Rule 3: Never Use Opus for Automation
Cron jobs, batch processing, scheduled tasks — all Flash or Kimi. Opus is for the 3-5% of work that requires frontier reasoning: architecture decisions, complex debugging, security audits.
Rule 4: ALL Writing Uses Sonnet
This is non-negotiable. Every blog post, article rewrite, and long-form piece runs on Claude Sonnet. The quality difference between Flash and Sonnet on writing tasks is enormous. You save everywhere else so you can afford to spend here.
⚠️ Warning
Always switch to Sonnet before starting a writing task: /model sonnet. If you forget, Flash will write your article — and the quality difference is immediately obvious. Flat tone, repetitive structure, shallow analysis.
Rule 5: SEO Analysis Uses Kimi
Kimi K2.5 excels at agentic browsing — visiting URLs, extracting data, comparing pages. At $0.60/1M, it’s 5x cheaper than Sonnet and handles SEO audits, keyword research, and competitor analysis well.
Rule 6: Batch in Groups of 10
Don’t send 10 separate prompts for 10 URLs. Send one prompt with all 10. Saves ~40% on repeated system prompt tokens.
Rule 7: “Use the Best Model” Means Sonnet
When the user says “use your best model for this,” route to Sonnet. Not Opus. The only path to Opus is the explicit command /model opus.
Rule 8: Check Before You Write
Run /status before any expensive task. Verify you’re on the right model, your context isn’t bloated, and your budget has headroom.
📝 Quick Reference
/model fast → Switch to Gemini Flash ($0.10/1M) /model kimi → Switch to Kimi K2.5 ($0.60/1M) /model sonnet → Switch to Claude Sonnet ($3.00/1M) /model opus → Switch to Claude Opus ($15.00/1M) /status → Check current model and context size
Real Cost Breakdown: Before and After
Here’s what our OpenRouter dashboard showed over a 30-day period:
Before (Default Auto-Routing)
| Task Type | Model Used | Tokens | Cost |
|---|---|---|---|
| Heartbeats (24/7) | Mixed (often Sonnet) | ~8M | $24.00 |
| Data extraction | Mixed (often Sonnet) | ~5M | $15.00 |
| SEO analysis | Sonnet | ~4M | $12.00 |
| Content writing | Sonnet | ~4M | $12.00 |
| Ad-hoc queries | Mixed | ~8M | $24.00 |
| Total | ~29M | $87.00 |
After (5-Tier Routing)
| Task Type | Model Pinned | Tokens | Cost |
|---|---|---|---|
| Heartbeats (24/7) | Flash | ~8M | $0.80 |
| Data extraction | Flash | ~5M | $0.50 |
| SEO analysis | Kimi K2.5 | ~4M | $2.40 |
| Content writing | Sonnet | ~4M | $12.00 |
| Compaction | Flash | ~3M | $0.30 |
| Architecture/debug | Opus | ~0.5M | $7.50 |
| Ad-hoc queries | Flash | ~5M | $0.50 |
| Total | ~29.5M | $24.00 |
SAME WORKLOAD, DIFFERENT ROUTING
$87 → $24
72% reduction. Content writing quality unchanged (still Sonnet).
The writing cost ($12) stayed exactly the same. That’s the point. You protect quality where it matters and eliminate waste everywhere else.
“Smart routing can reduce costs by 40-60% while maintaining quality. Model cascading approaches regularly achieve 60% to 87% cost reduction because the expensive models only process what they must.”
— Unified AI Hub, Economics of AI: Token-Based Cost Optimization, 2026
Per-Agent Routing: Go Even Further
Beyond the default routing, we pin specific models to specific agents:
| Agent | Model | Context Limit | Why This Model |
|---|---|---|---|
content-writer | Sonnet | 80K | Writing quality is non-negotiable |
seo-analyst | Kimi K2.5 | 50K | Best price/performance for browsing tasks |
data-worker | Flash | 30K | Data extraction doesn’t need reasoning |
OPENCLAW.JSON — AGENT-SPECIFIC ROUTING
"list": [
{
"id": "content-writer",
"model": "anthropic/claude-sonnet-4-5-20250514",
"params": { "contextTokens": 80000 }
},
{
"id": "seo-analyst",
"model": "moonshot/kimi-k2.5",
"params": { "contextTokens": 50000 }
},
{
"id": "data-worker",
"model": "google/gemini-2.0-flash-001",
"params": { "contextTokens": 30000 }
}
]The context limits are just as important as the model pins. A data worker processing CSVs doesn’t need 80K tokens of context. Capping it at 30K forces earlier compaction and keeps costs tight.
💡 Pro Tip
Compaction (context summarization) should also run on Flash. Don’t spend Sonnet tokens on mechanical text compression. Set "compaction": { "model": "google/gemini-2.0-flash-001" } in your config.
Common Routing Mistakes
Mistake 1: Leaving the Default as Auto
OpenClaw ships with openrouter/auto. That’s fine for testing. In production, it’s a blank check. Pin your default to Flash and escalate intentionally.
Mistake 2: No Fallback Chain
Without fallbacks, a model outage sends your traffic to whatever OpenRouter picks. Often that’s expensive. Define the chain: Flash → Kimi → Sonnet.
Mistake 3: Using Opus “Just to Be Safe”
Opus is incredible. It’s also $15/1M tokens. If you’re reaching for /model opus more than once a week, you’re probably over-routing. Sonnet handles 95% of complex tasks perfectly.
⚠️ Warning
A single batch job accidentally routed to Opus can cost more than your entire month of Flash usage. We’ve seen forum reports of $50+ surprise bills from forgetting to switch back after an Opus session. Always run /status before starting batch operations.
Frequently Asked Questions
Does Gemini Flash produce worse results than Sonnet?
For writing, yes — significantly. Flash output tends toward flat tone and repetitive structure. For classification, data extraction, file operations, and heartbeats? Flash performs identically to Sonnet at 1/30th the cost. Match the model to the task.
How do I know which model to use for a task?
Ask one question: does this task require nuanced language or creative reasoning? If yes, use Sonnet (or Opus for architecture-level decisions). If the task is mechanical — extracting data, moving files, checking status, processing CSVs — Flash handles it fine.
Can I change models mid-conversation?
Yes. Type /model sonnet to switch. The change takes effect on the next message. Switch to Sonnet before writing, back to Flash when you’re done. The habit takes a day to build and saves hundreds per month.
What about Chinese models like DeepSeek?
Kimi K2.5 and DeepSeek are excellent for mid-tier tasks. At $0.45-0.60/1M tokens, they fill the gap between Flash (too simple for complex reasoning) and Sonnet (too expensive for routine analysis). We use Kimi for SEO browsing tasks specifically because it handles multi-step web interactions well.
Is OpenRouter the only way to do model routing?
No, but it’s the easiest. OpenRouter gives you one API key for 200+ models with built-in fallbacks and budget controls. You could also self-host models via Ollama or vLLM, but that adds infrastructure complexity and the security considerations we discuss in our security guide.
What to Read Next
For the complete optimization playbook — including prompt caching, QMD memory search, heartbeat config, and budget guardrails — read our pillar guide: OpenClaw Token Optimization: The Complete 2026 Guide.
Want to see how these models perform on real SEO tasks? Check out AI Model Showdown for SEO: Gemini Flash vs Sonnet vs Kimi K2.5.
Back to AI Automation & Workflows Hub.
🔎 Key Takeaways
- Model routing cut our bill from $87 to $24/month — a 72% reduction with zero quality loss on writing
- Set your default to the cheapest viable model (Gemini Flash at $0.10/1M) and escalate intentionally
- Define a fallback chain (Flash → Kimi → Sonnet) so outages don’t silently route to expensive models
- Pin agents to specific models — content-writer gets Sonnet, data-worker gets Flash, no exceptions
- The 150x cost gap between Opus and Flash means one wrong routing decision can cost more than a month of correct ones
