What is the token limit for Gemini 3.1 Pro?

The model supports a 1.0M token context window and a max output of 66K tokens per request.

How much does it cost to use?

Input is priced at $2 per million tokens and output is $12 per million tokens.

Gemini 3.1 Pro for OpenClaw: Pricing, Setup, and What It's Good At

Current as of March 2026. Gemini 3.1 Pro is the flagship long-context model — 1M input, 66K output, and strong multimodal reasoning. The jump from 3 Flash to 3.1 Pro is mainly about reasoning quality and instruction fidelity; the context window is the same.

Specs


Provider	Google
Input cost	$2.00 / M tokens
Output cost	$12 / M tokens
Context window	1.0M tokens
Max output	66K tokens
Parameters	N/A
Features	function_calling, vision, reasoning

What it’s good at

1M context with serious reasoning

Unlike the Flash variants, this model can actually handle complex architectural analysis within that 1M window without losing the thread. Feed it an entire codebase and ask meaningful questions about global dependencies or cross-file bugs — it holds up.

Multimodal reasoning

Vision and reasoning are genuinely integrated here. It’s reliable on complex diagram parsing, UI screenshot analysis, and identifying edge cases in visual data. Not just “here’s what’s in this image” — it can reason about it.

66K output

Combined with 1M input, you get both the comprehension and the generation capacity. Ask it to understand a full codebase and then write a refactored module — that’s a workflow that actually fits.

Where it falls short

$12/M output cost

This is the constraint you manage the most. For analysis-heavy tasks with short outputs, it’s fine. For agents that generate a lot of tokens per cycle, costs climb fast. Track your output token usage carefully before committing to this model.

Instruction drift in long sessions

It can slip off complex system instructions during long-context sessions — specific JSON formatting rules, multi-constraint prompts. Claude 3.5 Sonnet handles this more reliably. Budget for validation steps in your agent workflow.

Best use cases with OpenClaw

Full codebase refactoring — Load every file and let the model understand global context before generating changes. Models with smaller windows miss cross-file dependencies.
Video content extraction — Ask about specific timestamps or visual details in long video files without having to pre-process and chunk the content.

Not ideal for

Simple Q&A or chatbots — $12/M output is expensive for basic interactions. Use a Flash model or something like Llama 3.1 8B for lightweight tasks.
Strict JSON schema enforcement — GPT-4o is more reliable here. If your agent depends on well-formed nested schemas, test this model thoroughly before deploying.

Run it through Haimaker

Skip juggling API keys. One Haimaker key gives you access to every model on the platform. Tell OpenClaw:

Add Haimaker as a custom provider to my OpenClaw config. Use these details:

- Provider name: haimaker
- Base URL: https://api.haimaker.ai/v1
- API key: [PASTE YOUR HAIMAKER API KEY HERE]
- API type: openai-completions

Add the auto-router model:
- haimaker/auto (reasoning: false, context: 128000, max tokens: 32000)

Create an alias "auto" for easy switching. Apply the config when done.

Or skip model selection entirely — Haimaker’s auto-router picks the best model for each task so you don’t have to.

OpenClaw setup

Configure the custom Gemini provider in OpenClaw. Set maxTokens to 65536 explicitly — without it, long-form generation will truncate before the model finishes.

{
  "models": {
    "mode": "merge",
    "providers": {
      "google": {
        "baseUrl": "https://generativelanguage.googleapis.com/v1beta",
        "apiKey": "YOUR-GOOGLE-API-KEY",
        "api": "openai-completions",
        "models": [
          {
            "id": "gemini-3.1-pro-preview",
            "name": "Gemini 3.1 Pro",
            "cost": {
              "input": 2,
              "output": 12
            },
            "contextWindow": 1048576,
            "maxTokens": 65536
          }
        ]
      }
    }
  }
}

How it compares

vs Claude 3.5 Sonnet — Sonnet follows complex instructions more reliably and is sharper on code logic. Its 200K context window is five times smaller than Gemini’s 1M, which is the tradeoff.
vs GPT-4o — GPT-4o is faster and more consistent on general reasoning. Gemini 3.1 Pro’s advantage is the context size and the $2/M input cost on document-heavy workloads.

Bottom line

The right choice when you need both 1M context and meaningful reasoning quality — not just ingestion capacity. Watch the output cost; it adds up if your agents are chatty.

TRY GEMINI 3.1 PRO ON HAIMAKER

For setup instructions, see our API key guide. For all available models, see the complete models guide.