Current as of March 2026. Gemini 2.0 Flash is the cheapest way to get a 1M context window. $0.10/M input, fast responses, native multimodality — it competes directly with GPT-4o-mini and Claude 3.5 Haiku but brings significantly more context headroom.

Specs

ProviderGoogle
Input cost$0.10 / M tokens
Output cost$0.40 / M tokens
Context window1.0M tokens
Max output8K tokens
ParametersN/A
Featuresfunction_calling, vision

What it’s good at

Context window at this price

$0.10/M input for 1M context is hard to argue with. You can drop entire documentation sites or large codebases directly into the prompt — no RAG pipeline required. That’s a real architectural simplification for the right use case.

Multimodal out of the box

Vision and audio processing are baked in, not bolted on. For OpenClaw agents that need to handle screenshots, UI states, or media files regularly, this removes an integration layer.

Speed

It’s fast. For agents running many sequential tool-calling steps, the low latency adds up across the full workflow.

Where it falls short

8K output cap

Same bottleneck as other Flash models. Fine for tool responses and summaries, but you’ll hit it quickly on any generation task that produces real volume.

Instruction adherence

Under complex, nested system instructions it can drift. Not as reliable as Claude 3.5 Sonnet or GPT-4o for strict JSON schemas or multi-constraint prompts.

Best use cases with OpenClaw

  • Large-scale data ingestion — Process 1M token payloads cheaply while maintaining reasonable reasoning quality. The cost difference versus Pro-tier models is substantial.
  • Fast agentic iteration — Low latency makes it a good fit for agents running many short, sequential tool calls where responsiveness matters.

Not ideal for

  • Long-form generation — The 8K output limit is real. Don’t try to generate extensive reports or large code files in a single pass.
  • Complex reasoning — When the task demands architectural-level logic, Gemini 2.5 Pro or GPT-4o are more consistent and less likely to make subtle errors.

Run it through Haimaker

Skip juggling API keys. One Haimaker key gives you access to every model on the platform. Tell OpenClaw:

Add Haimaker as a custom provider to my OpenClaw config. Use these details:

- Provider name: haimaker
- Base URL: https://api.haimaker.ai/v1
- API key: [PASTE YOUR HAIMAKER API KEY HERE]
- API type: openai-completions

Add the auto-router model:
- haimaker/auto (reasoning: false, context: 128000, max tokens: 32000)

Create an alias "auto" for easy switching. Apply the config when done.

Or skip model selection entirely — Haimaker’s auto-router picks the best model for each task so you don’t have to.

OpenClaw setup

Use the custom Gemini provider configuration and target the gemini-2.0-flash-001 endpoint specifically — the generic gemini-2.0-flash alias may point to a different revision.

{
  "models": {
    "mode": "merge",
    "providers": {
      "google": {
        "baseUrl": "https://generativelanguage.googleapis.com/v1beta",
        "apiKey": "YOUR-GOOGLE-API-KEY",
        "api": "openai-completions",
        "models": [
          {
            "id": "gemini-2.0-flash-001",
            "name": "Gemini 2.0 Flash",
            "cost": {
              "input": 0.09999999999999999,
              "output": 0.4
            },
            "contextWindow": 1048576,
            "maxTokens": 8192
          }
        ]
      }
    }
  }
}

How it compares

  • vs GPT-4o-mini — GPT-4o-mini is comparably fast and more reliable on complex instructions. Gemini 2.0 Flash gives you 8x the context window for slightly less on input cost.
  • vs Claude 3.5 Haiku — Haiku is more precise on strict logic tasks. Gemini 2.0 Flash is cheaper and handles multimodal inputs natively.

Bottom line

The cheapest 1M context model available. Best for high-volume data ingestion and fast agentic loops — just stay aware of the 8K output ceiling.

TRY GEMINI 2.0 FLASH ON HAIMAKER


For setup instructions, see our API key guide. For all available models, see the complete models guide.