xAI has six Grok models in the API right now. That’s too many for most people to evaluate, so here’s the short version: pick one of three depending on what you’re doing.
The quick answer
| Model | Input/Output Cost | Context | Best For |
|---|---|---|---|
| Grok 4.1 Fast | $0.20 / $0.50 | 2M | Default for most tasks |
| Grok Code Fast | $0.20 / $1.50 | 256K | Coding and file editing |
| Grok 4.20 | $2.00 / $6.00 | 2M | Deep reasoning, hard problems |
| Grok 4 | $3.00 / $15.00 | 256K | Real-time web + reasoning |
| Grok 4 Fast | $0.20 / $0.50 | 2M | Legacy (use 4.1 Fast instead) |
| Grok 3 | $3.00 / $15.00 | 131K | Legacy |
Most people should start with Grok 4.1 Fast and only reach for something else when they hit a wall.
Grok 4.1 Fast — the default pick
Grok 4.1 Fast is what I’d point most people to. $0.20/M input, $0.50/M output, 2M token context window. You can load an entire monorepo into context for less than a dollar.
It handles tool calling well — shell commands, file reads, API calls come back syntactically correct. It’s fast enough for real-time chat and stays coherent across long sessions. At this price, you can run it against thousands of files without thinking about the bill.
Where it falls short: complex multi-step reasoning and code generation where correctness really matters. It also lacks real-time web search (that’s Grok 4’s thing). But for refactoring, quick tasks, and day-to-day coding, nothing in the xAI lineup matches it on cost.
Grok Code Fast — the coding specialist
Grok Code Fast was trained differently. xAI pre-trained it on a programming-heavy corpus and fine-tuned it on real pull requests, so it’s better at reading stack traces, generating clean diffs, and chaining shell commands than the general-purpose models.
$0.20/M input, $1.50/M output, 256K context. Output costs 3x more than 4.1 Fast, which adds up on code generation tasks where responses run long.
I reach for Code Fast over 4.1 Fast when I’m doing large refactors or debugging sessions where I need the model to actually understand what grep output means. If you’re mostly chatting with some light code mixed in, 4.1 Fast is fine and cheaper.
The real trade-off is context. Code Fast caps at 256K tokens. If your codebase fits, it’s the better coding model. If it doesn’t, 4.1 Fast with its 2M window is your only option.
Grok 4.20 — the new reasoning model
Grok 4.20 came out of beta on March 18. It’s the reasoning model in the lineup: $2/M input, $6/M output, 2M context, and around 230 tokens per second output speed. That last number is the interesting one. Reasoning models are usually slow. Grok 4.20 is faster than Gemini 3.1 Flash and GPT 5.4, which are not reasoning models.
It scored 48 on the Artificial Analysis Intelligence Index (median for reasoning models in this price range: 31). xAI claims the lowest hallucination rate on the market, and from what I’ve seen with structured outputs, that’s plausible.
Two API variants: grok-4.20-beta-reasoning for chain-of-thought analysis, and grok-4.20-multi-agent-beta for agentic workflows with tool calling.
This is the model you reach for when the cheaper ones can’t figure it out. Hard debugging with multiple competing hypotheses. Architecture reviews where the model needs to hold a lot of context and actually think through trade-offs. If you’ve been using Claude Opus or GPT-5 for those tasks and want something cheaper, 4.20 at $2/$6 is worth trying before jumping to Grok 4 at $3/$15.
Grok 4 — the flagship (and when to skip it)
Grok 4 is the premium option at $3/$15 per million tokens. Its one unique feature is real-time web search baked into the model — it pulls live data during a conversation without you configuring external tools.
Honestly, most OpenClaw users can skip this. OpenClaw already has tool-calling for web searches, and $3/$15 is hard to justify when 4.1 Fast costs 15x less and Grok 4.20 does better reasoning for $2/$6. The 256K context window is also smaller than the 2M you get with 4.1 Fast or 4.20.
The niche case: you need integrated web search and deep reasoning in the same call, and you don’t want to wire up external tools. That’s a narrow use case, but it exists.
Setup in OpenClaw
Running through haimaker.ai
All Grok models are also available through haimaker.ai with a single API key. If you’re already using haimaker for other providers, you can access Grok models without a separate xAI account:
{
"models": {
"providers": {
"haimaker": {
"baseUrl": "https://api.haimaker.ai/v1",
"apiKey": "your-haimaker-api-key",
"api": "openai-completions"
}
}
}
}
This gives you Grok alongside Claude, GPT, Gemini, and dozens of open-source models through one provider.
Getting any Grok model running takes about two minutes.
1. Get your xAI API key
Sign up at console.x.ai. New accounts get $25 in free credits, plus $150/month if you opt into the data sharing program.
2. Add xAI as a provider
Open ~/.openclaw/openclaw.json and add xAI to your providers:
{
"models": {
"providers": {
"xai": {
"baseUrl": "https://api.x.ai/v1",
"apiKey": "your-xai-api-key",
"api": "openai-completions"
}
}
}
}
3. Add models to the allowlist
In the same file, add the models you want to use:
{
"agents": {
"defaults": {
"models": {
"xai/grok-4-1-fast": {},
"xai/grok-code-fast": {},
"xai/grok-4.20-beta": {}
}
}
}
}
4. Apply the config
Run openclaw gateway config.apply and switch models with /model during a session.
What I’d do
Set Grok 4.1 Fast as your default. Swap to Grok Code Fast when you’re heads-down coding and want the model to actually understand your toolchain. Bring in Grok 4.20 when something is genuinely hard and the cheaper models keep getting it wrong.
Ignore Grok 4 and Grok 3. They cost more and do less than the newer models.