How much does it cost?

Input is priced at $0.3 per million tokens and output is $1.2 per million tokens.

What is the context limit?

The model supports a maximum of 197,000 tokens for both input and output sequences.

Minimax M2.5 for OpenClaw: Pricing, Setup, and What It's Good At

Current as of March 2026. M2.5 is the latest in MiniMax’s budget line: same $0.30/$1.20 pricing as M2 but with a 197K context window that covers both input and output. The big difference from M2 is the output cap — M2.5 lets you generate up to 197K tokens back, not just 8K. That changes what you can use it for.

Specs


Provider	MiniMax
Input cost	$0.30 / M tokens
Output cost	$1.20 / M tokens
Context window	197K tokens
Max output	197K tokens
Parameters	N/A
Features	function_calling

What it’s good at

Pricing at Scale

$0.30/$1.20 is genuinely cheap. If you’re running high-volume background tasks, the cost delta vs GPT-4o adds up fast.

Context and Output Symmetry

197K in, 197K out. That’s a useful property for transformation tasks — ingesting a large document and producing a similarly large artifact.

Tool Use

Function calling handles complex API schemas without falling apart. Reliable enough for production agentic loops.

Where it falls short

Geographic Latency

Servers are in China. If you’re in the US or Europe, TTFT will be higher than from a local-region provider. Not catastrophic, but noticeable.

Reasoning Ceiling

It won’t catch the edge cases that Claude or GPT-4o catch. On complex multi-step logic, expect occasional misses.

Best use cases with OpenClaw

Large-scale Data Extraction — 197K context plus function calling at low prices. Good for converting long documents into structured data.
Log Analysis — Ingest big log batches, use tools to surface specific errors. Cheap enough to run continuously.

Not ideal for

Latency-Sensitive Apps — Variable TTFT makes it a poor fit for anything user-facing.
Complex Code Generation — It misses edge cases. Use GPT-4o or a dedicated coding model for anything subtle.

Run it through Haimaker

Skip juggling API keys. One Haimaker key gives you access to every model on the platform. Tell OpenClaw:

Add Haimaker as a custom provider to my OpenClaw config. Use these details:

- Provider name: haimaker
- Base URL: https://api.haimaker.ai/v1
- API key: [PASTE YOUR HAIMAKER API KEY HERE]
- API type: openai-completions

Add the auto-router model:
- haimaker/auto (reasoning: false, context: 128000, max tokens: 32000)

Create an alias "auto" for easy switching. Apply the config when done.

Or skip model selection entirely — Haimaker’s auto-router picks the best model for each task so you don’t have to.

OpenClaw setup

Point OpenClaw to api.haimaker.ai/v1 and use your Haimaker API key. The model follows the standard OpenAI chat completion schema for easy integration.

{
  "models": {
    "mode": "merge",
    "providers": {
      "minimax": {
        "baseUrl": "https://api.haimaker.ai/v1",
        "apiKey": "YOUR-MINIMAX-API-KEY",
        "api": "openai-completions",
        "models": [
          {
            "id": "minimax-m2.5",
            "name": "Minimax M2.5",
            "cost": {
              "input": 0.3,
              "output": 1.2
            },
            "contextWindow": 196608,
            "maxTokens": 196608
          }
        ]
      }
    }
  }
}

How it compares

vs Llama 3.1 70B — M2.5 wins on context window (197K vs 128K). Llama generally beats it on reasoning quality.
vs GPT-4o-mini — 4o-mini is cheaper on input ($0.15/M) but you hit the 128K context ceiling earlier. For tasks that need the extra room, M2.5 is worth the small premium.

Bottom line

Use M2.5 when you need both a large context window and a large output limit in the same request, and you’re willing to trade some reasoning quality for a lower bill.

TRY MINIMAX M2.5 ON HAIMAKER

For setup instructions, see our API key guide. For all available models, see the complete models guide.