“Free” gets thrown around a lot in the AI model space. Let me be specific about what that actually means for OpenClaw users.
There are three categories: models that cost literally nothing (local), models with free tiers that eventually run out, and models so cheap they round to zero on most invoices. I’ll cover all three.
Actually free: local models
The only models that cost nothing per-token are the ones running on your own hardware. Ollama makes this straightforward.
Install it, pull a model, point OpenClaw at it:
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull a coding model
ollama pull qwen3:32b
Then add Ollama as a provider in ~/.openclaw/openclaw.json:
{
models: {
providers: {
ollama: {
baseUrl: "http://localhost:11434/v1",
api: "openai-completions",
models: [
{ id: "qwen3:32b", name: "Qwen3 32B" }
]
}
}
},
agents: {
defaults: {
model: { primary: "ollama/qwen3:32b" }
}
}
}
Which local models work well
Qwen3 32B is the current sweet spot. It handles code generation, debugging, and multi-file edits well enough for day-to-day work. Needs ~24GB VRAM (an RTX 4090, or an M-series Mac with 32GB+ unified memory).
Llama 3.3 70B is better at reasoning but needs serious hardware. 2x A100s or a Mac with 64GB+ memory. Most people don’t have this sitting around.
Mistral 7B and CodeLlama 7B run on almost anything but the quality drops fast on complex tasks. Fine for simple code generation and file reads. I wouldn’t trust them with multi-file refactors.
The honest tradeoff
Local models are free but not fast. Response times are 3-10x slower than cloud APIs, depending on your hardware. And even the best open-source model at 32B parameters can’t match Claude Sonnet on tool calling reliability.
If your work involves mostly reading files, generating boilerplate, and simple edits, local works well. If you’re doing complex debugging or architectural work, you’ll want a cloud model for those tasks.
Free tiers from cloud providers
A few providers offer genuinely free usage up to a limit.
Gemini Flash is the best free option for cloud inference. Google’s free tier gives you 15 requests per minute with up to 1M token context. That’s enough for casual coding sessions.
// Add to ~/.openclaw/openclaw.json
{
models: {
providers: {
google: {
models: [
{ id: "gemini-3-flash", name: "Gemini 3 Flash" }
]
}
}
}
}
The catch: the free tier has stricter rate limits and your data may be used for training. For side projects and learning, that’s probably fine. For proprietary code, use the paid API or run local.
Almost free: sub-dollar models
Some models cost so little per token that a full day of heavy OpenClaw usage stays under $1.
| Model | Input cost | Output cost | Daily cost estimate |
|---|---|---|---|
| MiniMax M2.5 | $0.30/M | $1.20/M | ~$0.60 |
| GPT-4o-mini | $0.15/M | $0.60/M | ~$0.50 |
| GLM-4.7 Flash | $0.07/M | $0.28/M | ~$0.15 |
| DeepSeek V3 | $0.27/M | $1.10/M | ~$0.80 |
Daily estimates based on ~500K input + 200K output tokens, which is a busy coding day.
These models handle the boring parts of a coding session well: file reads, simple edits, documentation, test runs. Route the 20% of hard problems to a better model and your total bill stays under $5/day.
You can access all of these through Haimaker with a single API key, or set them up individually with each provider.
The practical setup: hybrid free + cheap
Most people who care about costs end up here:
- Local model for simple tasks (Qwen3 32B via Ollama, free)
- Cheap cloud model for medium tasks (MiniMax M2.5 or GPT-4o-mini, pennies)
- Premium model for hard problems (Claude Sonnet or Opus, pay-per-use)
{
agents: {
defaults: {
model: {
primary: "ollama/qwen3:32b",
thinking: "anthropic/claude-sonnet-4-20250514"
}
}
}
}
The local model handles 60-70% of requests (reading files, simple code). Sonnet kicks in for the rest. Your daily API bill drops to $2-5 instead of $30-50.
For more on model routing, see our guide on multi-agent workflows.
GET $10 FREE CREDITS ON HAIMAKER
For a full model comparison, see best models for OpenClaw. For cost optimization strategies, see cutting token costs by 96%.