What is the token pricing?

Input tokens cost $0.5 per million and output tokens cost $3 per million.

How big is the context window?

It supports 1 million input tokens, which is roughly 700,000 words or several hours of video.

Does it support function calling?

Yes, it supports native function calling which is essential for OpenClaw agent tool use.

Gemini 3 Flash for OpenClaw: Pricing, Setup, and What It's Good At

Current as of March 2026. Gemini 3 Flash is where the Flash-class models get genuinely interesting. It keeps the 1M context window but bumps the output limit to 66K tokens — a big deal compared to the 8K cap on earlier Flash models. It also adds built-in web search and URL context on top of the standard vision and function calling.

Specs


Provider	Google
Input cost	$0.50 / M tokens
Output cost	$3.00 / M tokens
Context window	1.0M tokens
Max output	66K tokens
Parameters	N/A
Features	function_calling, vision, reasoning, web_search, url_context

What it’s good at

66K output limit

This is the meaningful upgrade over Gemini 2.x Flash models. You can generate substantial amounts of text in a single pass — full reports, long code files, structured documents — without the constant truncation problem that plagues 8K-output models.

1M context plus web search

The combination is useful for research agents: ingest a large document corpus and still have the model pull live URLs or search results into the same context. Not many models offer that together.

Native multimodal

Vision, video, and URL context built in. For agents that analyze web pages, screenshots, or video content, this avoids the extra setup required by models that treat multimodal as an add-on.

Where it falls short

Reasoning quality

It’s a Flash model. Complex logical puzzles and deep architectural decisions will surface errors that a Pro-tier model handles more cleanly. For anything where correctness is critical, step up to 3.1 Pro.

Long-context instruction drift

With the full 1M context active, the model can lose track of system instructions or slip on JSON formatting. Expect to be more explicit in your prompts and possibly add output validation.

Best use cases with OpenClaw

Long document summarization — Ingest a 500-page PDF in one call for $0.50/M input and generate a detailed summary without hitting output limits.
Video analysis agents — The native video support and 1M context let agents reason over long clips that would require complex preprocessing with other models.

Not ideal for

High-stakes reasoning — Hallucinations on complex datasets are a real risk. Don’t use this model where a wrong logical conclusion has serious consequences.
Minimal-latency chat — If you need sub-second responses for simple interactions, there are cheaper and faster options.

Run it through Haimaker

Skip juggling API keys. One Haimaker key gives you access to every model on the platform. Tell OpenClaw:

Add Haimaker as a custom provider to my OpenClaw config. Use these details:

- Provider name: haimaker
- Base URL: https://api.haimaker.ai/v1
- API key: [PASTE YOUR HAIMAKER API KEY HERE]
- API type: openai-completions

Add the auto-router model:
- haimaker/auto (reasoning: false, context: 128000, max tokens: 32000)

Create an alias "auto" for easy switching. Apply the config when done.

Or skip model selection entirely — Haimaker’s auto-router picks the best model for each task so you don’t have to.

OpenClaw setup

Use the Gemini API provider configuration in OpenClaw. Set maxTokens to 65535 explicitly — the default is often much lower and will silently truncate long generations.

{
  "models": {
    "mode": "merge",
    "providers": {
      "google": {
        "baseUrl": "https://generativelanguage.googleapis.com/v1beta",
        "apiKey": "YOUR-GOOGLE-API-KEY",
        "api": "openai-completions",
        "models": [
          {
            "id": "gemini-3-flash-preview",
            "name": "Gemini 3 Flash",
            "cost": {
              "input": 0.5,
              "output": 3
            },
            "contextWindow": 1048576,
            "maxTokens": 65535
          }
        ]
      }
    }
  }
}

How it compares

vs GPT-4o-mini — GPT-4o-mini costs less on output ($0.60 vs $3.00/M) and handles strict JSON more reliably. Gemini 3 Flash gives you 8x the context window and the 66K output limit.
vs Claude 3 Haiku — Haiku is more consistent on structured output and nuanced instructions. Gemini 3 Flash wins on context size, output volume, and native multimodal features.

Bottom line

The step up from Gemini 2.x Flash that actually changes what’s possible: 66K output plus 1M context plus web search. Good fit for context-heavy agents where you need meaningful generation volume.

TRY GEMINI 3 FLASH ON HAIMAKER

For setup instructions, see our API key guide. For all available models, see the complete models guide.