Current as of March 2026. Gemini 3.1 Pro is the flagship long-context model — 1M input, 66K output, and strong multimodal reasoning. The jump from 3 Flash to 3.1 Pro is mainly about reasoning quality and instruction fidelity; the context window is the same.
Specs
| Provider | |
| Input cost | $2.00 / M tokens |
| Output cost | $12 / M tokens |
| Context window | 1.0M tokens |
| Max output | 66K tokens |
| Parameters | N/A |
| Features | function_calling, vision, reasoning |
What it’s good at
1M context with serious reasoning
Unlike the Flash variants, this model can actually handle complex architectural analysis within that 1M window without losing the thread. Feed it an entire codebase and ask meaningful questions about global dependencies or cross-file bugs — it holds up.
Multimodal reasoning
Vision and reasoning are genuinely integrated here. It’s reliable on complex diagram parsing, UI screenshot analysis, and identifying edge cases in visual data. Not just “here’s what’s in this image” — it can reason about it.
66K output
Combined with 1M input, you get both the comprehension and the generation capacity. Ask it to understand a full codebase and then write a refactored module — that’s a workflow that actually fits.
Where it falls short
$12/M output cost
This is the constraint you manage the most. For analysis-heavy tasks with short outputs, it’s fine. For agents that generate a lot of tokens per cycle, costs climb fast. Track your output token usage carefully before committing to this model.
Instruction drift in long sessions
It can slip off complex system instructions during long-context sessions — specific JSON formatting rules, multi-constraint prompts. Claude 3.5 Sonnet handles this more reliably. Budget for validation steps in your agent workflow.
Best use cases with OpenClaw
- Full codebase refactoring — Load every file and let the model understand global context before generating changes. Models with smaller windows miss cross-file dependencies.
- Video content extraction — Ask about specific timestamps or visual details in long video files without having to pre-process and chunk the content.
Not ideal for
- Simple Q&A or chatbots — $12/M output is expensive for basic interactions. Use a Flash model or something like Llama 3.1 8B for lightweight tasks.
- Strict JSON schema enforcement — GPT-4o is more reliable here. If your agent depends on well-formed nested schemas, test this model thoroughly before deploying.
Run it through Haimaker
Skip juggling API keys. One Haimaker key gives you access to every model on the platform. Tell OpenClaw:
Add Haimaker as a custom provider to my OpenClaw config. Use these details:
- Provider name: haimaker
- Base URL: https://api.haimaker.ai/v1
- API key: [PASTE YOUR HAIMAKER API KEY HERE]
- API type: openai-completions
Add the auto-router model:
- haimaker/auto (reasoning: false, context: 128000, max tokens: 32000)
Create an alias "auto" for easy switching. Apply the config when done.
Or skip model selection entirely — Haimaker’s auto-router picks the best model for each task so you don’t have to.
OpenClaw setup
Configure the custom Gemini provider in OpenClaw. Set maxTokens to 65536 explicitly — without it, long-form generation will truncate before the model finishes.
{
"models": {
"mode": "merge",
"providers": {
"google": {
"baseUrl": "https://generativelanguage.googleapis.com/v1beta",
"apiKey": "YOUR-GOOGLE-API-KEY",
"api": "openai-completions",
"models": [
{
"id": "gemini-3.1-pro-preview",
"name": "Gemini 3.1 Pro",
"cost": {
"input": 2,
"output": 12
},
"contextWindow": 1048576,
"maxTokens": 65536
}
]
}
}
}
}
How it compares
- vs Claude 3.5 Sonnet — Sonnet follows complex instructions more reliably and is sharper on code logic. Its 200K context window is five times smaller than Gemini’s 1M, which is the tradeoff.
- vs GPT-4o — GPT-4o is faster and more consistent on general reasoning. Gemini 3.1 Pro’s advantage is the context size and the $2/M input cost on document-heavy workloads.
Bottom line
The right choice when you need both 1M context and meaningful reasoning quality — not just ingestion capacity. Watch the output cost; it adds up if your agents are chatty.
TRY GEMINI 3.1 PRO ON HAIMAKER
For setup instructions, see our API key guide. For all available models, see the complete models guide.