What is the exact pricing for MiniMax M2.1 Lightning?

Input costs $0.3 per million tokens and output costs $2.4 per million tokens.

How large is the context window?

The model supports up to 1,000,000 tokens in a single request.

Does it support function calling?

Yes, it has native support for function_calling and reasoning features.

MiniMax M2.1 Lightning for OpenClaw: Pricing, Setup, and What It's Good At

Current as of March 2026. MiniMax M2.1 Lightning has one job: take in a lot of tokens cheaply. The 1M context window at $0.30/M input is the selling point. The catch is the 8K output ceiling — you can stuff a million tokens in, but you’re getting a short response back. Plan your use cases accordingly.

Specs


Provider	MiniMax
Input cost	$0.30 / M tokens
Output cost	$2.40 / M tokens
Context window	1M tokens
Max output	8K tokens
Parameters	N/A
Features	function_calling, reasoning

What it’s good at

1M Context at a Reasonable Price

Most models that go this wide on context charge accordingly. At $0.30/M input, you can actually afford to use the full window without running up the bill.

Tool Use

Function calling is more stable than I expected from a “lightning” tier model. It follows tool schemas without hallucinating arguments on straightforward payloads.

Reasoning

It holds coherent logic even with dense, multi-part instructions in the prompt. Not o1-level, but better than pure flash models.

Where it falls short

8K Output Cap

This is the real constraint. If you’re feeding 800K tokens of context in and expecting a 50K token report out, this isn’t your model. You get 8K max back.

Regional Latency

Expect some TTFT variance depending on where your requests originate and what time it is in APAC.

Safety Filters

Tuned for Chinese regulatory requirements. Technical content occasionally trips the filters. Plan for retry logic if you’re processing anything that could look security-adjacent.

Best use cases with OpenClaw

Large Codebase Q&A — Drop an entire repo in and ask focused questions. You only need a short answer, which fits the 8K output cap perfectly.
High-Volume Classification — Cheap input means you can run this against a lot of content. The reasoning step improves accuracy over pure flash models.

Not ideal for

Long-Form Generation — 8K out is a hard wall. For long reports, look at M2.1 (non-lightning) or another model with a higher output limit.
Political or Sensitive Content — Safety filters are tuned for Chinese regulatory compliance. Controversial queries get refused.

Run it through Haimaker

Skip juggling API keys. One Haimaker key gives you access to every model on the platform. Tell OpenClaw:

Add Haimaker as a custom provider to my OpenClaw config. Use these details:

- Provider name: haimaker
- Base URL: https://api.haimaker.ai/v1
- API key: [PASTE YOUR HAIMAKER API KEY HERE]
- API type: openai-completions

Add the auto-router model:
- haimaker/auto (reasoning: false, context: 128000, max tokens: 32000)

Create an alias "auto" for easy switching. Apply the config when done.

Or skip model selection entirely — Haimaker’s auto-router picks the best model for each task so you don’t have to.

OpenClaw setup

Configure your provider to use api.haimaker.ai/v1 with the OpenAI-compatible SDK. Set your model ID to minimax/MiniMax-M2.1-lightning and ensure your timeout is high enough for large context processing.

{
  "models": {
    "mode": "merge",
    "providers": {
      "minimax": {
        "baseUrl": "https://api.haimaker.ai/v1",
        "apiKey": "YOUR-MINIMAX-API-KEY",
        "api": "openai-completions",
        "models": [
          {
            "id": "MiniMax-M2.1-lightning",
            "name": "MiniMax M2.1 Lightning",
            "cost": {
              "input": 0.3,
              "output": 2.4
            },
            "contextWindow": 1000000,
            "maxTokens": 8192
          }
        ]
      }
    }
  }
}

How it compares

vs GPT-4o-mini — 4o-mini has better reasoning but a 128K context cap. If you need 1M tokens and can live with short outputs, Lightning wins on context.
vs Gemini 1.5 Flash — Gemini is cheaper on small tasks, but Lightning’s function calling tends to be more reliable for complex tool schemas.

Bottom line

Good for large-context read tasks where your answer can fit in 8K tokens. If you need a long output from a large input, this is the wrong model.

TRY MINIMAX M2.1 LIGHTNING ON HAIMAKER

For setup instructions, see our API key guide. For all available models, see the complete models guide.