Google has five Gemini models available through the API, and they all share the same killer feature: a 1M token context window. That’s enough to fit an entire monorepo in a single prompt. The question is which one to use.

The quick answer

ModelInput/Output CostContextMax OutputBest For
Gemini 3 Flash$0.50 / $3.001M66KDefault for most tasks
Gemini 3.1 Pro$2.00 / $121M66KHard coding and reasoning
Gemini 2.5 Pro$1.25 / $101M8KLong-context analysis
Gemini 2.5 Flash$0.30 / $2.501M8KCheap batch processing
Gemini 2.0 Flash$0.10 / $0.401M8KFree-tier / lowest cost

Start with Gemini 3 Flash. Step up to 3.1 Pro when the Flash model keeps getting things wrong.

Gemini 3 Flash — the default pick

Gemini 3 Flash is the model I’d point most OpenClaw users to. $0.50/M input, $3.00/M output, 1M context window, and a 66K output limit that lets you generate substantial code in one pass.

It scored 78% on SWE-bench Verified, which actually beat Gemini 3 Pro (76.2%) at the time. For a Flash-class model, that’s unusual. It also comes with function calling, reasoning, built-in web search, and URL context — features that normally require Pro-tier pricing.

The 66K output limit is the upgrade that matters. Earlier Flash models capped at 8K, which meant your agent couldn’t generate a full file without truncating. At 66K, refactoring a whole module in one shot becomes practical.

Where it falls short: instruction-following gets unreliable at the edges of that 1M context window. If you’re pushing past 500K tokens, expect the model to occasionally drop constraints from your system prompt. Explicit output validation helps.

Gemini 3.1 Pro — the hard-problems model

3.1 Pro is where you go when Flash can’t figure it out. $2.00/M input, $12/M output, same 1M context and 66K output, but noticeably better reasoning.

On SWE-bench, 3.1 Pro scores 80.6% — a real jump from both Flash (78%) and the earlier 3 Pro (76.8%). On LiveCodeBench Pro it hit a 2887 Elo rating, which puts it in the same range as the top Claude and GPT models.

The gap shows up most on multi-step debugging and architectural decisions where the model needs to hold a lot of context and reason about trade-offs. Flash will attempt these tasks and sometimes get them right. 3.1 Pro gets them right more consistently.

Watch the output cost. $12/M means a chatty agent can run up a bill quickly. I’d use 3.1 Pro as a step-up model you switch to for specific hard tasks, not as your default for everything.

Note: Google deprecated the gemini-3-pro model ID. Use gemini-3.1-pro-preview in your OpenClaw config.

Gemini 2.5 Pro — when you need analysis, not generation

2.5 Pro sits in an odd spot now that 3 Flash exists. $1.25/M input with 1M context, but the output caps at 8K tokens. You can analyze an entire codebase in one prompt, but you can’t generate much in response.

That makes it a specialized tool. Load a project and ask architectural questions, trace dependencies across files, or audit for security issues. Comprehension is strong at this context size. Just don’t expect it to write the fix for you. The 8K output limit gets in the way.

If your workflow is mostly reading code and producing short summaries, 2.5 Pro at $1.25/M is cheaper than 3.1 Pro at $2/M and the analysis quality is comparable. If you need to generate code, pick 3 Flash or 3.1 Pro.

Gemini 2.5 Flash — cheap and reliable

2.5 Flash is the workhorse for high-volume tasks where quality isn’t the top priority. $0.30/M input, $2.50/M output, 1M context, 8K output. Same vision support as the Pro variant.

Good for batch processing, document classification, tagging pipelines, and any workflow where you’re running thousands of small tasks and need the total bill to stay low. The 1M context means you can skip building a RAG pipeline and just dump everything into the prompt.

Reasoning quality is lower than 3 Flash. Don’t use it for tasks where a wrong answer has real consequences. But for volume work where you’re optimizing for throughput and cost, it’s solid.

Gemini 2.0 Flash — the free-tier option

2.0 Flash is the cheapest Gemini model: $0.10/M input, $0.40/M output. Same 1M context window as everything else in the lineup. 8K output cap.

The real story here is Google’s free tier. You get 60 requests per minute and 1,000 requests per day at no cost. For a developer testing OpenClaw or running a personal coding agent, that’s enough to avoid paying anything. No other major provider offers a free tier this generous.

Performance is the weakest in the lineup, but it’s still competitive with GPT-4o-mini at a fraction of the price. For simple agent tasks — running shell commands, reading files, answering quick questions — it works fine.

The 1M context advantage

Every Gemini model shares the same 1M context window. That’s 5x Claude’s 200K and 8x GPT-4o’s 128K. In practice, this means:

  • Your agent can see the entire codebase at once, not just the files you remembered to include
  • Cross-file refactoring doesn’t require stitching together multiple context windows
  • Large document analysis (specs, logs, test output) fits in a single prompt
  • You can skip building a retrieval pipeline for most projects

The trade-off is latency. Pushing past 500K tokens slows response times noticeably, especially on Pro models. Flash models handle it better.

Setup in OpenClaw

Running through haimaker.ai

All Gemini models are also available through haimaker.ai with a single API key. If you’re already using haimaker for other providers, you can access Gemini models without a separate Google account:

{
  "models": {
    "providers": {
      "haimaker": {
        "baseUrl": "https://api.haimaker.ai/v1",
        "apiKey": "your-haimaker-api-key",
        "api": "openai-completions"
      }
    }
  }
}

This gives you Gemini alongside Claude, GPT, DeepSeek, Grok, and dozens of other models through one provider.

Getting any Gemini model running takes about two minutes.

1. Get your Google API key

Go to Google AI Studio and create an API key. Free accounts get the generous rate limits mentioned above.

2. Add Google as a provider

Open ~/.openclaw/openclaw.json and add Google to your providers:

{
  "models": {
    "providers": {
      "google": {
        "baseUrl": "https://generativelanguage.googleapis.com/v1beta",
        "apiKey": "your-google-api-key",
        "api": "openai-completions"
      }
    }
  }
}

3. Add models to the allowlist

In the same file, add the models you want:

{
  "agents": {
    "defaults": {
      "models": {
        "google/gemini-3-flash-preview": {},
        "google/gemini-3.1-pro-preview": {},
        "google/gemini-2.0-flash-001": {}
      }
    }
  }
}

4. Apply the config

Run openclaw gateway config.apply and switch models with /model during a session.

What I’d do

Set Gemini 3 Flash as your default Gemini model. The combination of 1M context, 66K output, built-in web search, and $0.50/M input makes it the best all-around pick. Step up to 3.1 Pro when you’re doing complex debugging or architectural reviews and Flash keeps getting things wrong.

If cost is tight, start with 2.0 Flash on the free tier. You can always upgrade later.

For context: Gemini 3 Flash competes well with Claude Sonnet on many coding tasks at lower cost, but Claude is still more reliable on strict instruction-following and structured output. See our complete models guide for cross-provider comparisons, or check the cheapest models roundup if budget is the main constraint.