Current as of March 2026. Grok 4 doubles Grok 3’s context to 256K — both input and output. Same $3/$15 price point. It’s essentially xAI’s answer to the question of what happens when you can fit an entire legacy codebase into one prompt and get back a complete rewrite.
Specs
| Provider | xAI |
| Input cost | $3.00 / M tokens |
| Output cost | $15 / M tokens |
| Context window | 256K tokens |
| Max output | 256K tokens |
| Parameters | N/A |
| Features | function_calling, web_search |
What it’s good at
Token limits
256K on both sides is the story here. Most frontier models either have a large input window or a generous output limit — not both. GPT-4o caps output at 4K. Being able to send and receive 256K tokens in a single call changes what’s possible for code generation and refactoring tasks.
Input pricing
$3/M input is cheaper than GPT-4o’s $5/M. For RAG pipelines or any workflow that repeatedly ingests large documents, that difference accumulates quickly.
Where it falls short
Reasoning drift
Long context retrieval is not Grok 4’s strength. If you need the model to precisely locate and reason about something buried in a 200K token document, Claude 3.5 Sonnet is more reliable. Grok 4 can miss things that are semantically distant from the end of the prompt.
Instruction adherence
It occasionally ignores negative constraints in system prompts — “do not do X” type instructions. You often need to rephrase constraints positively or repeat them to make them stick, which is more work than it should be at this price point.
Best use cases with OpenClaw
- Large-scale code refactoring — The 256K output buffer means you can pipe in an entire legacy module and get a fully rewritten version back in one shot. This is genuinely useful for the right migration tasks.
- High-volume data summarization — $3/M input makes processing thousands of customer support logs or documents economical at scale.
Not ideal for
- Zero-latency chatbots — Time-to-first-token can lag when the context is light, compared to models tuned for fast interactive responses.
- Formal logic verification — The chain-of-thought reasoning isn’t as stable as models specifically tuned for mathematical work. Don’t use it for proofs or constraint solving.
Run it through Haimaker
Skip juggling API keys. One Haimaker key gives you access to every model on the platform. Tell OpenClaw:
Add Haimaker as a custom provider to my OpenClaw config. Use these details:
- Provider name: haimaker
- Base URL: https://api.haimaker.ai/v1
- API key: [PASTE YOUR HAIMAKER API KEY HERE]
- API type: openai-completions
Add the auto-router model:
- haimaker/auto (reasoning: false, context: 128000, max tokens: 32000)
Create an alias "auto" for easy switching. Apply the config when done.
Or skip model selection entirely — Haimaker’s auto-router picks the best model for each task so you don’t have to.
OpenClaw setup
Configure OpenClaw to use the OpenAI provider but override the base URL to https://api.x.ai/v1. Ensure you set the max_tokens parameter to 262144 to take full advantage of the output window.
{
"models": {
"mode": "merge",
"providers": {
"xai": {
"baseUrl": "https://api.x.ai/v1",
"apiKey": "YOUR-XAI-API-KEY",
"api": "openai-completions",
"models": [
{
"id": "grok-4",
"name": "Grok 4",
"cost": {
"input": 3,
"output": 15
},
"contextWindow": 256000,
"maxTokens": 256000
}
]
}
}
}
}
How it compares
- vs GPT-4o — Grok 4 is cheaper for inputs ($3 vs $5 per 1M) and offers a 256K output limit compared to GPT-4o’s 4K limit.
- vs Claude 3.5 Sonnet — Sonnet has superior coding logic, but Grok 4 provides a much larger context window (256K vs 200K) and integrated web search.
Bottom line
Grok 4 is the right choice when token volume is your primary constraint — specifically when you need both a large input and a large output in the same call. For precision work or strict instruction following, something else will serve you better.
For setup instructions, see our API key guide. For all available models, see the complete models guide.