The cheapest API for an AI coding agent depends on what your agent is doing. The model that wins on a price-per-million-tokens chart usually loses when you measure cost-per-completed-task — because cheap models retry more, hallucinate tool arguments, and drag the agent into loops.

This is a coding-agent-focused breakdown. If you’re picking an API for a chat app or one-shot completion, the math is different.

Quick rankings (May 2026)

TierModelInput / Output ($/M)Use For
Free localQwen3.6 27B via Ollama$0 / $0Privacy work, hardware you already own
Cheapest cloudDeepSeek V3.2~$0.27 / $1.10High-volume agent default
Cheapest cloudMiniMax M2.5~$0.30 / $1.20High-volume agent default
Cheap + free tierGemini 3 Flash$0.50 / $3 (free tier: 1K/day)Long-context reads, free tier abuse
Cheap + huge contextGrok 4.1 Fast$0.20 / $0.502M context monorepo work
Cheap reasoningGPT-OSS-120b via Haimaker~$0.50 / $2When the cheap model isn’t smart enough
Frontier fallbackClaude Sonnet 4.6$3 / $15The 10-20% of tasks that need it

The pricing column is a moving target. xAI cut Grok 4.3 input by 40% on April 30. DeepSeek revises every few months. Treat the table as directional, not authoritative — check provider docs before you commit a budget.

Why output price matters more than input price

A chat app reads short prompts and writes short answers. A coding agent reads files, writes patches, explains failures, rewrites tests, and dumps diffs. The output side is where the bill actually accrues.

Two examples on a 10M-tokens-per-day agent (typical for an always-on assistant):

  • All Claude Sonnet 4.6 ($3 in / $15 out, ~70% output): 7M output × $15 = $105/day + $9 input = $114/day
  • All MiniMax M2.5 ($0.30 in / $1.20 out, ~70% output): 7M × $1.20 = $8.40/day + $0.90 input = $9.30/day

That’s a 12x gap, and almost all of it is output cost. A model with cheap input and expensive output looks great on a marketing chart and terrible on a bill.

The two cheapest output prices in this lineup: Grok 4.1 Fast at $0.50/M and DeepSeek V3.2 at ~$1.10/M. If your agent is generating a lot of code or explanations, those two get the look first.

The free option: local

Ollama on your own hardware is $0 per token. The catch is the hardware.

For coding-agent work, the threshold to be useful is Qwen3.6 27B (released April 22, 2026), which scores 77.2% on SWE-bench Verified — better than some cloud models you’d pay per token to use. It runs on 18GB+ of VRAM. That’s a single RTX 5090, an M5 Pro with 36GB+ unified memory, or an M5 Max.

Smaller hardware (16GB VRAM, 16GB unified memory) gets you Qwen3.6 9B or the Qwen3.6 35B-A3B MoE model. Both are fast (180+ t/s on an RTX 5090) and good for boilerplate, file reads, and simple edits — but not for hard refactors.

Local makes economic sense in three cases:

  1. You already own the hardware
  2. Your data can’t leave your machine (legal, IP, regulated industry)
  3. You’re doing high-volume bulk work where API costs would dwarf hardware amortization

Otherwise the math usually favors a cheap cloud model. Setup walkthrough: best local models for OpenClaw.

When each cloud model wins

MiniMax M2.5 / DeepSeek V3.2 — the everyday default

These two trade places at the bottom of the price chart depending on the month. Both handle 70-80% of typical agent traffic — file reads, classification, code completion, simple edits — at a price low enough to leave on 24/7. Either one is a reasonable default for a budget-conscious agent.

DeepSeek V3.2 has slightly cheaper output and a coding-tuned variant. M2.5 has a 196K context window and is faster on tool calls in our testing. Pick one, stick with it for a month, and only swap if you hit a specific wall.

Grok 4.1 Fast — when context size matters

$0.20/$0.50 with a 2M token context window. Nothing else at this price loads an entire monorepo. If your agent’s job involves “read this 800-file repo and answer questions,” Grok 4.1 Fast is the only cheap model that can hold all of it in one request.

It’s not the best coding model — Grok 4.3 ($1.25/$2.50) is meaningfully better at SWE-bench tasks but 6x the price. Use 4.1 Fast for context-heavy reads, 4.3 for the hard reasoning step.

Gemini 3 Flash — when “free” is on the table

Google’s free tier (60 RPM, 1,000 requests/day) is the cheapest possible cloud setup if your agent fits inside the rate limits. Past the free tier you’re at $0.50/$3, which is competitive but not class-leading.

Gemini Flash also has the longest free-tier context window (1M tokens). Worth wiring up as a fallback even if it’s not your default.

GPT-OSS-120b — cheap reasoning

OpenAI’s open-weight reasoning model, available through Haimaker at ~$0.50/$2. When MiniMax/DeepSeek aren’t smart enough but you don’t want to jump to Claude Opus pricing, this sits in the right gap.

The routing strategy that actually saves money

Picking one cheap model and using it for everything leaves savings on the table because some tasks genuinely need the frontier. Picking one frontier model wastes money because most tasks don’t.

The path most users land on is two-tier routing: a cheap default + a smart fallback. The cheap model handles the easy 70-80% of traffic; the fallback catches the hard ones.

You can do this manually:

# in OpenClaw
/model minimax-m2.5    # default
/model claude-sonnet   # when the cheap model gets stuck

Or you can let Haimaker’s auto-router handle it — point your agent at haimaker/auto and the router picks per-request based on task complexity. A typical mix for a coding agent looks like 55% MiniMax M2.5, 25% GPT-OSS-120b, 20% Claude Sonnet. Blended cost lands well under $1/M tokens.

Use this

For a new OpenClaw agent today:

  • Cheap default: MiniMax M2.5 or DeepSeek V3.2 ($0.30 / $1.20-ish)
  • Long context: Grok 4.1 Fast ($0.20 / $0.50, 2M window)
  • Smart fallback: Claude Sonnet 4.6 or Grok 4.3
  • Free tier topping: Gemini 3 Flash for fits-in-1K-requests-a-day workloads
  • Privacy floor: Qwen3.6 27B locally if hardware exists

The cheapest API isn’t a single answer. It’s a routing setup. Set up the routing once, then stop thinking about it.

GET $10 FREE CREDITS ON HAIMAKER


Related: Cheapest AI APIs in 2026, Cheapest models for OpenClaw, Best local models for OpenClaw, Haimaker auto-router setup.