Current as of March 2026. Gemini 3 Flash is where the Flash-class models get genuinely interesting. It keeps the 1M context window but bumps the output limit to 66K tokens — a big deal compared to the 8K cap on earlier Flash models. It also adds built-in web search and URL context on top of the standard vision and function calling.
Specs
| Provider | |
| Input cost | $0.50 / M tokens |
| Output cost | $3.00 / M tokens |
| Context window | 1.0M tokens |
| Max output | 66K tokens |
| Parameters | N/A |
| Features | function_calling, vision, reasoning, web_search, url_context |
What it’s good at
66K output limit
This is the meaningful upgrade over Gemini 2.x Flash models. You can generate substantial amounts of text in a single pass — full reports, long code files, structured documents — without the constant truncation problem that plagues 8K-output models.
1M context plus web search
The combination is useful for research agents: ingest a large document corpus and still have the model pull live URLs or search results into the same context. Not many models offer that together.
Native multimodal
Vision, video, and URL context built in. For agents that analyze web pages, screenshots, or video content, this avoids the extra setup required by models that treat multimodal as an add-on.
Where it falls short
Reasoning quality
It’s a Flash model. Complex logical puzzles and deep architectural decisions will surface errors that a Pro-tier model handles more cleanly. For anything where correctness is critical, step up to 3.1 Pro.
Long-context instruction drift
With the full 1M context active, the model can lose track of system instructions or slip on JSON formatting. Expect to be more explicit in your prompts and possibly add output validation.
Best use cases with OpenClaw
- Long document summarization — Ingest a 500-page PDF in one call for $0.50/M input and generate a detailed summary without hitting output limits.
- Video analysis agents — The native video support and 1M context let agents reason over long clips that would require complex preprocessing with other models.
Not ideal for
- High-stakes reasoning — Hallucinations on complex datasets are a real risk. Don’t use this model where a wrong logical conclusion has serious consequences.
- Minimal-latency chat — If you need sub-second responses for simple interactions, there are cheaper and faster options.
Run it through Haimaker
Skip juggling API keys. One Haimaker key gives you access to every model on the platform. Tell OpenClaw:
Add Haimaker as a custom provider to my OpenClaw config. Use these details:
- Provider name: haimaker
- Base URL: https://api.haimaker.ai/v1
- API key: [PASTE YOUR HAIMAKER API KEY HERE]
- API type: openai-completions
Add the auto-router model:
- haimaker/auto (reasoning: false, context: 128000, max tokens: 32000)
Create an alias "auto" for easy switching. Apply the config when done.
Or skip model selection entirely — Haimaker’s auto-router picks the best model for each task so you don’t have to.
OpenClaw setup
Use the Gemini API provider configuration in OpenClaw. Set maxTokens to 65535 explicitly — the default is often much lower and will silently truncate long generations.
{
"models": {
"mode": "merge",
"providers": {
"google": {
"baseUrl": "https://generativelanguage.googleapis.com/v1beta",
"apiKey": "YOUR-GOOGLE-API-KEY",
"api": "openai-completions",
"models": [
{
"id": "gemini-3-flash-preview",
"name": "Gemini 3 Flash",
"cost": {
"input": 0.5,
"output": 3
},
"contextWindow": 1048576,
"maxTokens": 65535
}
]
}
}
}
}
How it compares
- vs GPT-4o-mini — GPT-4o-mini costs less on output ($0.60 vs $3.00/M) and handles strict JSON more reliably. Gemini 3 Flash gives you 8x the context window and the 66K output limit.
- vs Claude 3 Haiku — Haiku is more consistent on structured output and nuanced instructions. Gemini 3 Flash wins on context size, output volume, and native multimodal features.
Bottom line
The step up from Gemini 2.x Flash that actually changes what’s possible: 66K output plus 1M context plus web search. Good fit for context-heavy agents where you need meaningful generation volume.
TRY GEMINI 3 FLASH ON HAIMAKER
For setup instructions, see our API key guide. For all available models, see the complete models guide.