OpenAI ships new GPT models faster than anyone writes guides for them. As of April 2026, there are more than a dozen models live in the API, and most OpenClaw users are trying to figure out which one to use.
Short version: pick one of four depending on what you’re doing.
The quick answer
| Model | Input/Output Cost | Context | Best For |
|---|---|---|---|
| GPT-5.4 Mini | $0.75 / $4.50 | 400K | Default for daily coding |
| GPT-5.4 | $2.50 / $15 | 1.05M | Hard problems, long context |
| GPT-5.1-Codex-Max | $1.25 / $10 | 400K | Agent loops, tool calling |
| GPT-5.4 Nano | $0.20 / $1.25 | 400K | Cheap fast paths |
| GPT-5.4 Pro | $30 / $180 | 1.05M | Research-grade reasoning only |
| GPT-5 Mini | $0.25 / $2 | 400K | Legacy (use 5.4 Mini instead) |
Most people should start with GPT-5.4 Mini and only reach for something else when the task actually needs it.
GPT-5.4 Mini — the default pick
GPT-5.4 Mini is what I’d put in front of most people. $0.75/M input, $4.50/M output, 400K context. It’s about 3x cheaper than GPT-5.4 on input and close enough in quality that you won’t notice on day-to-day coding.
Tool calling is solid. Function signatures come back clean, arguments are typed correctly, and the model doesn’t hallucinate file paths. For refactors, bug hunting, and running OpenClaw against a real repo, this is the one I keep coming back to.
Where it falls short: really hard reasoning and novel algorithm design. If you’re writing something where correctness is everything and the model needs to hold five competing hypotheses in its head, step up to GPT-5.4.
GPT-5.4 — the flagship
$2.50/M input, $15/M output, 1.05M token context. GPT-5.4 is the high-end general-purpose model. It landed in March and the delta over 5.3 is real on hard problems: better at multi-file refactors, fewer logic errors on non-trivial code, noticeably better at following long instructions without drifting.
The 1M context window is the other reason to reach for it. You can feed it an entire mid-sized codebase and it will actually use the cross-file information. GPT-5.4 Mini caps at 400K, which is still a lot, but if you’re dumping a monorepo in, you want the full window.
I use it for architecture reviews, hard debugging, and anything where I’d rather pay more than iterate three times. Cost-wise, it’s in the same tier as Claude Sonnet 4.6 and Gemini 3.1 Pro. Not cheap, but not ridiculous either.
GPT-5.1-Codex-Max — the agent loop model
If OpenClaw is running in full agent mode (writing code, running it, reading errors, fixing them, commit), this is the model I’d use. $1.25/M input, $10/M output, 400K context.
The Codex-series models are tuned for the inner loop of programming. They’re better at reading stack traces, writing diffs that apply cleanly, and chaining shell commands without losing the plot. GPT-5 Codex is the older sibling and slightly cheaper; Codex-Max handles longer autonomous runs without wandering.
If you’re using OpenClaw interactively (you drive, the model helps), GPT-5.4 Mini is fine and cheaper. If you’re telling OpenClaw “fix this bug” and walking away, give Codex-Max a try.
GPT-5.4 Nano — the budget fast path
$0.20/M input, $1.25/M output, 400K context. Nano is what you reach for when you want a GPT model but can’t justify the cost: quick commit-message generation, file summarization, linter comments. It’s not as smart as Mini, but it’s fast and dirt cheap.
Honestly, for most OpenClaw workflows, it’s a false economy. The difference between Nano and Mini output-token cost is $3.25/M. For a typical coding session using maybe 2M output tokens, that’s $6.50. Not worth the quality drop unless you’re running a high-volume batch job.
GPT-5.4 Pro — almost never
$30/M input, $180/M output, 1.05M context. GPT-5.4 Pro is positioned as the flagship reasoning model. On paper it’s better than 5.4 at the hardest problems. In practice, I’ve had a hard time finding tasks where the quality gain justifies being 12x more expensive than GPT-5.4.
If you’re doing literal research-grade work (novel proof writing, symbolic math, or multi-hour autonomous reasoning that needs to be right on the first try), fine. For everyone else, GPT-5.4 at $2.50/$15 does the job.
Legacy models to skip
- GPT-5 ($1.25/$10): Replaced by GPT-5.1 Chat at the same price with better behavior. No reason to pin to the older version.
- GPT-5 Mini ($0.25/$2): Cheaper than 5.4 Mini but noticeably worse at tool calling. The $0.50 you save on input tokens isn’t worth the iteration cost.
- GPT-4.1 / 4o / o1 / o3: All superseded. Don’t start new projects on them.
Setup in OpenClaw
Running through haimaker.ai
All OpenAI models are also available through haimaker.ai with a single API key. If you’re already using haimaker for other providers, you can get GPT-5.4 and friends without creating a separate OpenAI account:
{
"models": {
"providers": {
"haimaker": {
"baseUrl": "https://api.haimaker.ai/v1",
"apiKey": "your-haimaker-api-key",
"api": "openai-completions"
}
}
}
}
This also gets you access to Claude, Gemini, Grok, and dozens of open-source models through the same provider.
Direct OpenAI setup
Getting OpenAI running takes about two minutes.
1. Get your OpenAI API key
Sign up at platform.openai.com. You’ll need to add billing before the API will accept requests. There’s no free tier for programmatic use.
2. Add OpenAI as a provider
Open ~/.openclaw/openclaw.json and add OpenAI to your providers:
{
"models": {
"providers": {
"openai": {
"baseUrl": "https://api.openai.com/v1",
"apiKey": "your-openai-api-key",
"api": "openai-completions"
}
}
}
}
3. Add models to the allowlist
In the same file, add the models you want to use:
{
"agents": {
"defaults": {
"models": {
"openai/gpt-5.4-mini": {},
"openai/gpt-5.4": {},
"openai/gpt-5.1-codex-max": {}
}
}
}
}
4. Apply the config
Run openclaw gateway config.apply and switch models with /model during a session.
What I’d do
Default to GPT-5.4 Mini. Step up to GPT-5.4 when the task is actually hard or you need the full 1M context window. Swap in GPT-5.1-Codex-Max when OpenClaw is running agent-style and you won’t be driving. Ignore Nano and Pro unless you have a specific reason. The middle of the lineup is where the real work gets done.