What is the token limit?

Gemini 2.5 Flash supports a 1,000,000 token context window and an 8,192 token maximum output.

How much does it cost?

Input tokens are priced at $0.3 per million, while output tokens cost $2.5 per million.

Gemini 2.5 Flash for OpenClaw: Pricing, Setup, and What It's Good At

Current as of March 2026. Gemini 2.5 Flash is the Flash-class version of the 2.5 Pro model — faster and cheaper, with the same 1M context window. Compared to GPT-4o-mini, you’re paying similar input costs but getting 8x the context headroom.

Specs


Provider	Google
Input cost	$0.30 / M tokens
Output cost	$2.50 / M tokens
Context window	1.0M tokens
Max output	8K tokens
Parameters	N/A
Features	function_calling, vision

What it’s good at

1M context at a reasonable price

$0.30/M input for a 1M context window is a solid deal. For document analysis, large archive summarization, or any task where you’d otherwise be building retrieval infrastructure, this model removes that complexity entirely.

Multimodal throughput

Native vision support is fast and cheap enough for high-frequency image classification or video frame description at scale. Good fit for OpenClaw agents that need to process visual inputs regularly.

Where it falls short

Reasoning depth

This is where you feel the difference from the Pro variant. Multi-step logical puzzles and complex architectural decisions — the model will try but you’ll see more errors than you’d get from a Pro-tier model.

Instruction following in long context

It occasionally drops negative constraints from system prompts when the context gets long. You’ll likely need more explicit prompt engineering to keep it on track compared to GPT-4o-mini.

Best use cases with OpenClaw

Large document analysis — Process a 1M token archive in one shot. No retrieval pipeline, no chunking errors, just the whole thing in context.
Multimodal tagging at scale — Fast, cheap, and capable enough for image classification and frame description in high-frequency loops.

Not ideal for

Complex logic — Hallucination rate on deep reasoning tasks is higher than flagship models. Don’t use it for anything where a wrong answer has real consequences.
Strict JSON schemas — Nested structured output can be unreliable. If your agent depends on well-formed JSON, GPT-4o-mini is more consistent.

Run it through Haimaker

Skip juggling API keys. One Haimaker key gives you access to every model on the platform. Tell OpenClaw:

Add Haimaker as a custom provider to my OpenClaw config. Use these details:

- Provider name: haimaker
- Base URL: https://api.haimaker.ai/v1
- API key: [PASTE YOUR HAIMAKER API KEY HERE]
- API type: openai-completions

Add the auto-router model:
- haimaker/auto (reasoning: false, context: 128000, max tokens: 32000)

Create an alias "auto" for easy switching. Apply the config when done.

Or skip model selection entirely — Haimaker’s auto-router picks the best model for each task so you don’t have to.

OpenClaw setup

Configure the custom Gemini provider in OpenClaw with your API key. Set maxTokens to 8192 explicitly — the default can cause truncated responses on longer outputs.

{
  "models": {
    "mode": "merge",
    "providers": {
      "google": {
        "baseUrl": "https://generativelanguage.googleapis.com/v1beta",
        "apiKey": "YOUR-GOOGLE-API-KEY",
        "api": "openai-completions",
        "models": [
          {
            "id": "gemini-2.5-flash",
            "name": "Gemini 2.5 Flash",
            "cost": {
              "input": 0.3,
              "output": 2.5
            },
            "contextWindow": 1048576,
            "maxTokens": 8192
          }
        ]
      }
    }
  }
}

How it compares

vs GPT-4o-mini — GPT-4o-mini is more reliable on structured output and stricter instruction-following. Gemini 2.5 Flash gives you 8x the context window at comparable input cost.
vs Claude 3.5 Haiku — Haiku is better at coding and following nuanced instructions. Gemini 2.5 Flash is cheaper and handles native multimodal inputs without extra setup.

Bottom line

Use it when context size is the bottleneck and you don’t need frontier-level reasoning. It’s a strong fit for large-scale data ingestion and multimodal processing where volume matters more than precision.

TRY GEMINI 2.5 FLASH ON HAIMAKER

For setup instructions, see our API key guide. For all available models, see the complete models guide.