How I Cut My AI Agent Costs By 70% With Smart Model Routing

Our AI agent cost $87 in its first month. The fix wasn’t using it less — it was routing each task to the right model. Three weeks later, same workload, $27/month.

Model routing is the single highest-impact optimization you can make to an AI agent setup. It cut our costs more than caching, context trimming, and QMD memory search combined. Here’s the exact config and the rules that make it work.

After running OpenClaw for a production SEO operation handling 500+ articles, content audits, and daily automation — these are the numbers from our actual OpenRouter dashboard.

What Model Routing Actually Means

Most AI agent setups use one model for everything. OpenClaw’s default config uses openrouter/auto, which picks a model based on availability and capability — not cost.

That means a simple heartbeat check (“are you alive?”) might hit Claude Opus at $15 per million tokens. A CSV file rename might burn Sonnet tokens at $3/1M. Meanwhile, Gemini Flash handles both tasks perfectly at $0.10/1M.

THE COST GAP

150x

Price difference between Opus ($15/1M) and Flash ($0.10/1M) for the same simple task

Model routing fixes this. You define tiers — cheap models for cheap tasks, expensive models only where quality demands it. The AI doesn’t choose. The config does.

Key point: Routing isn’t about using worse models. It’s about not using a $15/1M model for tasks a $0.10/1M model handles identically.

Our 5-Tier Routing Setup

Here’s the exact model hierarchy we run in production:

Tier	Model	Alias	$/1M Tokens	% of Tasks	Monthly Cost
Budget	Gemini 2.0 Flash	`fast`	$0.10	~55%	~$5.50
Worker	Kimi K2.5	`kimi`	$0.60	~15%	~$5.00
Writer	Claude Sonnet 4.5	`sonnet`	$3.00	~25%	~$14.00
Frontier	Claude Opus 4.6	`opus`	$15.00	~5%	~$3.00
Research	Perplexity Sonar Pro	`sonar`	Variable	As needed	~$0 (rare)

The math works because 75% of all tasks run on models costing under $1/1M tokens. The expensive models (Sonnet, Opus) only touch tasks where their quality genuinely matters.

The Config That Makes It Happen

OPENCLAW.JSON — MODEL ROUTING

{
  "agents": {
    "defaults": {
      "model": "google/gemini-2.0-flash-001",
      "models": {
        "fast": {
          "id": "google/gemini-2.0-flash-001",
          "aliases": ["flash", "budget", "free"],
          "description": "Heartbeats, classification, data extraction"
        },
        "kimi": {
          "id": "moonshot/kimi-k2.5",
          "aliases": ["worker", "deepseek"],
          "description": "SEO analysis, agentic browsing"
        },
        "sonnet": {
          "id": "anthropic/claude-sonnet-4-5-20250514",
          "aliases": ["writer", "claude"],
          "description": "ALL content writing. Non-negotiable."
        },
        "opus": {
          "id": "anthropic/claude-opus-4-6",
          "aliases": ["frontier"],
          "description": "ONLY via explicit /model opus"
        }
      },
      "fallbacks": [
        "google/gemini-2.0-flash-001",
        "moonshot/kimi-k2.5",
        "anthropic/claude-sonnet-4-5-20250514"
      ]
    }
  }
}

Three things to notice:

Default is Flash. Every task starts on the cheapest model. You opt up, not down.
Aliases make switching easy. Type /model sonnet before a writing task. Type /model fast to go back.
Fallback chain goes cheap → mid → expensive. If Flash is down, Kimi takes over. Sonnet is the last resort, never Opus.

💡 Pro Tip

The fallback chain matters more than you’d think. Without it, a Flash outage silently routes everything to whatever OpenRouter picks — often Opus. With the chain, you control the escalation path: Flash → Kimi → Sonnet. Never Opus in the fallback.

The 8 Routing Rules We Follow

These rules are baked into our agent’s workspace instructions. They run on every session:

Rule 1: Heartbeats Always Use Flash

The heartbeat runs 24/7. On Opus, that’s $15-30/month for zero productive output. On Flash, it’s $1.50/month. This is configured in openclaw.json — the agent can’t override it.

Rule 2: Sub-Agents Default to Flash

When your main agent spawns a sub-agent for a task (web scraping, file processing, data extraction), that sub-agent inherits the default model. Flash. Only escalate if the sub-task genuinely needs reasoning.

Rule 3: Never Use Opus for Automation

Cron jobs, batch processing, scheduled tasks — all Flash or Kimi. Opus is for the 3-5% of work that requires frontier reasoning: architecture decisions, complex debugging, security audits.

Rule 4: ALL Writing Uses Sonnet

This is non-negotiable. Every blog post, article rewrite, and long-form piece runs on Claude Sonnet. The quality difference between Flash and Sonnet on writing tasks is enormous. You save everywhere else so you can afford to spend here.

⚠️ Warning

Always switch to Sonnet before starting a writing task: /model sonnet. If you forget, Flash will write your article — and the quality difference is immediately obvious. Flat tone, repetitive structure, shallow analysis.

Rule 5: SEO Analysis Uses Kimi

Kimi K2.5 excels at agentic browsing — visiting URLs, extracting data, comparing pages. At $0.60/1M, it’s 5x cheaper than Sonnet and handles SEO audits, keyword research, and competitor analysis well.

Rule 6: Batch in Groups of 10

Don’t send 10 separate prompts for 10 URLs. Send one prompt with all 10. Saves ~40% on repeated system prompt tokens.

Rule 7: “Use the Best Model” Means Sonnet

When the user says “use your best model for this,” route to Sonnet. Not Opus. The only path to Opus is the explicit command /model opus.

Rule 8: Check Before You Write

Run /status before any expensive task. Verify you’re on the right model, your context isn’t bloated, and your budget has headroom.

📝 Quick Reference

/model fast    → Switch to Gemini Flash ($0.10/1M)
/model kimi    → Switch to Kimi K2.5 ($0.60/1M)
/model sonnet  → Switch to Claude Sonnet ($3.00/1M)
/model opus    → Switch to Claude Opus ($15.00/1M)
/status        → Check current model and context size

Real Cost Breakdown: Before and After

Here’s what our OpenRouter dashboard showed over a 30-day period:

Before (Default Auto-Routing)

Task Type	Model Used	Tokens	Cost
Heartbeats (24/7)	Mixed (often Sonnet)	~8M	$24.00
Data extraction	Mixed (often Sonnet)	~5M	$15.00
SEO analysis	Sonnet	~4M	$12.00
Content writing	Sonnet	~4M	$12.00
Ad-hoc queries	Mixed	~8M	$24.00
Total		~29M	$87.00

After (5-Tier Routing)

Task Type	Model Pinned	Tokens	Cost
Heartbeats (24/7)	Flash	~8M	$0.80
Data extraction	Flash	~5M	$0.50
SEO analysis	Kimi K2.5	~4M	$2.40
Content writing	Sonnet	~4M	$12.00
Compaction	Flash	~3M	$0.30
Architecture/debug	Opus	~0.5M	$7.50
Ad-hoc queries	Flash	~5M	$0.50
Total		~29.5M	$24.00

SAME WORKLOAD, DIFFERENT ROUTING

$87 → $24

72% reduction. Content writing quality unchanged (still Sonnet).

The writing cost ($12) stayed exactly the same. That’s the point. You protect quality where it matters and eliminate waste everywhere else.

“Smart routing can reduce costs by 40-60% while maintaining quality. Model cascading approaches regularly achieve 60% to 87% cost reduction because the expensive models only process what they must.”
— Unified AI Hub, Economics of AI: Token-Based Cost Optimization, 2026

Per-Agent Routing: Go Even Further

Beyond the default routing, we pin specific models to specific agents:

Agent	Model	Context Limit	Why This Model
`content-writer`	Sonnet	80K	Writing quality is non-negotiable
`seo-analyst`	Kimi K2.5	50K	Best price/performance for browsing tasks
`data-worker`	Flash	30K	Data extraction doesn’t need reasoning

OPENCLAW.JSON — AGENT-SPECIFIC ROUTING

"list": [
  {
    "id": "content-writer",
    "model": "anthropic/claude-sonnet-4-5-20250514",
    "params": { "contextTokens": 80000 }
  },
  {
    "id": "seo-analyst",
    "model": "moonshot/kimi-k2.5",
    "params": { "contextTokens": 50000 }
  },
  {
    "id": "data-worker",
    "model": "google/gemini-2.0-flash-001",
    "params": { "contextTokens": 30000 }
  }
]

The context limits are just as important as the model pins. A data worker processing CSVs doesn’t need 80K tokens of context. Capping it at 30K forces earlier compaction and keeps costs tight.

💡 Pro Tip

Compaction (context summarization) should also run on Flash. Don’t spend Sonnet tokens on mechanical text compression. Set "compaction": { "model": "google/gemini-2.0-flash-001" } in your config.

Common Routing Mistakes

Mistake 1: Leaving the Default as Auto

OpenClaw ships with openrouter/auto. That’s fine for testing. In production, it’s a blank check. Pin your default to Flash and escalate intentionally.

Mistake 2: No Fallback Chain

Without fallbacks, a model outage sends your traffic to whatever OpenRouter picks. Often that’s expensive. Define the chain: Flash → Kimi → Sonnet.

Mistake 3: Using Opus “Just to Be Safe”

Opus is incredible. It’s also $15/1M tokens. If you’re reaching for /model opus more than once a week, you’re probably over-routing. Sonnet handles 95% of complex tasks perfectly.

⚠️ Warning

A single batch job accidentally routed to Opus can cost more than your entire month of Flash usage. We’ve seen forum reports of $50+ surprise bills from forgetting to switch back after an Opus session. Always run /status before starting batch operations.

Frequently Asked Questions

Does Gemini Flash produce worse results than Sonnet?

For writing, yes — significantly. Flash output tends toward flat tone and repetitive structure. For classification, data extraction, file operations, and heartbeats? Flash performs identically to Sonnet at 1/30th the cost. Match the model to the task.

How do I know which model to use for a task?

Ask one question: does this task require nuanced language or creative reasoning? If yes, use Sonnet (or Opus for architecture-level decisions). If the task is mechanical — extracting data, moving files, checking status, processing CSVs — Flash handles it fine.

Can I change models mid-conversation?

Yes. Type /model sonnet to switch. The change takes effect on the next message. Switch to Sonnet before writing, back to Flash when you’re done. The habit takes a day to build and saves hundreds per month.

What about Chinese models like DeepSeek?

Kimi K2.5 and DeepSeek are excellent for mid-tier tasks. At $0.45-0.60/1M tokens, they fill the gap between Flash (too simple for complex reasoning) and Sonnet (too expensive for routine analysis). We use Kimi for SEO browsing tasks specifically because it handles multi-step web interactions well.

Is OpenRouter the only way to do model routing?

No, but it’s the easiest. OpenRouter gives you one API key for 200+ models with built-in fallbacks and budget controls. You could also self-host models via Ollama or vLLM, but that adds infrastructure complexity and the security considerations we discuss in our security guide.

What to Read Next

For the complete optimization playbook — including prompt caching, QMD memory search, heartbeat config, and budget guardrails — read our pillar guide: OpenClaw Token Optimization: The Complete 2026 Guide.

Want to see how these models perform on real SEO tasks? Check out AI Model Showdown for SEO: Gemini Flash vs Sonnet vs Kimi K2.5.

Back to AI Automation & Workflows Hub.

🔎 Key Takeaways

Model routing cut our bill from $87 to $24/month — a 72% reduction with zero quality loss on writing
Set your default to the cheapest viable model (Gemini Flash at $0.10/1M) and escalate intentionally
Define a fallback chain (Flash → Kimi → Sonnet) so outages don’t silently route to expensive models
Pin agents to specific models — content-writer gets Sonnet, data-worker gets Flash, no exceptions
The 150x cost gap between Opus and Flash means one wrong routing decision can cost more than a month of correct ones

How I Cut My AI Agent Costs by 70% with Smart Model Routing

What Model Routing Actually Means

Our 5-Tier Routing Setup

The Config That Makes It Happen

The 8 Routing Rules We Follow

Rule 1: Heartbeats Always Use Flash

Rule 2: Sub-Agents Default to Flash

Rule 3: Never Use Opus for Automation

Rule 4: ALL Writing Uses Sonnet

Rule 5: SEO Analysis Uses Kimi

Rule 6: Batch in Groups of 10

Rule 7: “Use the Best Model” Means Sonnet

Rule 8: Check Before You Write

Real Cost Breakdown: Before and After

Before (Default Auto-Routing)

After (5-Tier Routing)

Per-Agent Routing: Go Even Further

Common Routing Mistakes

Mistake 1: Leaving the Default as Auto

Mistake 2: No Fallback Chain

Mistake 3: Using Opus “Just to Be Safe”

Frequently Asked Questions

Does Gemini Flash produce worse results than Sonnet?

How do I know which model to use for a task?

Can I change models mid-conversation?

What about Chinese models like DeepSeek?

Is OpenRouter the only way to do model routing?

What to Read Next

🔎 Key Takeaways

About The Author

DesignCopy

Recent Posts

Search

Join the Conversation

You are successfully subscribed!

How I Cut My AI Agent Costs by 70% with Smart Model Routing

What Model Routing Actually Means

Our 5-Tier Routing Setup

The Config That Makes It Happen

The 8 Routing Rules We Follow

Rule 1: Heartbeats Always Use Flash

Rule 2: Sub-Agents Default to Flash

Rule 3: Never Use Opus for Automation

Rule 4: ALL Writing Uses Sonnet

Rule 5: SEO Analysis Uses Kimi

Rule 6: Batch in Groups of 10

Rule 7: “Use the Best Model” Means Sonnet

Rule 8: Check Before You Write

Real Cost Breakdown: Before and After

Before (Default Auto-Routing)

After (5-Tier Routing)

Per-Agent Routing: Go Even Further

Common Routing Mistakes

Mistake 1: Leaving the Default as Auto

Mistake 2: No Fallback Chain

Mistake 3: Using Opus “Just to Be Safe”

Frequently Asked Questions

Does Gemini Flash produce worse results than Sonnet?

How do I know which model to use for a task?

Can I change models mid-conversation?

What about Chinese models like DeepSeek?

Is OpenRouter the only way to do model routing?

What to Read Next

🔎 Key Takeaways

About The Author

DesignCopy

Related Posts

Generative Engine Optimization (GEO) vs. SEO Guide

7 Best AI Outreach Tools for Link Building (2026 Review)

AI SEO Analytics Guide: Transform Your Data Strategy

How We Automated 500 SEO Posts: Our Full AI Content Pipeline

Recent Posts

Search

You are successfully subscribed!