What is the exact cost per million tokens?

Input tokens cost $0.6 per million and output tokens cost $3 per million.

How large is the context window?

Kimi K2.5 supports up to 262,144 tokens for both input and output sequences.

Does it support tool use?

Yes, it has native support for function calling and vision, making it compatible with OpenClaw's tool-using agents.

Kimi K2.5 for OpenClaw: Pricing, Setup, and What It's Good At

Current as of March 2026. Kimi K2.5 is a 1.1T parameter MoE model from Moonshot AI. The stat that stands out is 262K tokens for both input and output — you can feed it a massive document set and get a similarly long response back. At $0.60/M input, that’s accessible. The $3.00/M output is where you need to budget carefully.

Specs


Provider	Moonshot AI
Input cost	$0.60 / M tokens
Output cost	$3.00 / M tokens
Context window	262K tokens
Max output	262K tokens
Parameters	1.1T
Features	function_calling, vision

What it’s good at

262K Context + 262K Output

This combination is genuinely rare. Most models with a large context window cap output at 8K or 16K. K2.5 lets you transform or generate long artifacts from long inputs.

Input Pricing for the Parameter Count

$0.60/M is cheap for a 1.1T parameter model. You’re getting a lot of model for the input cost — the output side is where the price reflects the scale.

Multimodal

Vision and function calling are both native. Useful for OpenClaw agents that need to process screenshots alongside text or hit external APIs.

Where it falls short

Output Cost

$3.00/M output is 5x the input rate. If you’re using the full 262K output window regularly, the bill climbs fast. Budget the output side carefully.

Latency

1.1T parameters means slow inference. TTFT is high, and it doesn’t improve much under load.

API Location

The endpoint is at api.moonshot.cn. Users outside Asia will see higher latency and occasional jitter. Not ideal for time-sensitive workflows.

Best use cases with OpenClaw

Large Document Transformation — Big input, big output, reasonable input cost. This is the core use case.
Visual Reasoning Tasks — The parameter scale handles complex vision tasks that stumble smaller models.

Not ideal for

Real-time Chatbots — TTFT is too high for anything interactive.
High-Volume Simple Tasks — You’re paying for 1.1T parameters. Use a smaller model for classification or basic summarization.

Run it through Haimaker

Skip juggling API keys. One Haimaker key gives you access to every model on the platform. Tell OpenClaw:

Add Haimaker as a custom provider to my OpenClaw config. Use these details:

- Provider name: haimaker
- Base URL: https://api.haimaker.ai/v1
- API key: [PASTE YOUR HAIMAKER API KEY HERE]
- API type: openai-completions

Add the auto-router model:
- haimaker/auto (reasoning: false, context: 128000, max tokens: 32000)

Create an alias "auto" for easy switching. Apply the config when done.

Or skip model selection entirely — Haimaker’s auto-router picks the best model for each task so you don’t have to.

OpenClaw setup

You must configure a custom provider in OpenClaw pointing to https://api.moonshot.cn/v1. Ensure your timeout settings are increased to account for the model’s processing time on large context inputs.

{
  "models": {
    "mode": "merge",
    "providers": {
      "moonshotai": {
        "baseUrl": "https://api.moonshot.cn/v1",
        "apiKey": "YOUR-MOONSHOTAI-API-KEY",
        "api": "openai-completions",
        "models": [
          {
            "id": "kimi-k2.5",
            "name": "Kimi K2.5",
            "cost": {
              "input": 0.6,
              "output": 3
            },
            "contextWindow": 262144,
            "maxTokens": 262144
          }
        ]
      }
    }
  }
}

How it compares

vs GPT-4o-mini — 4o-mini is cheaper on output but caps at 128K context. K2.5 wins when you need 262K of either.
vs Claude 3.5 Sonnet — Claude is better at coding and costs $3/M input. K2.5 is cheaper to read from, worse to generate with.
vs DeepSeek-V3 — Both are strong. K2.5’s 262K output limit is the specific differentiator for long-form generation tasks.

Bottom line

Use it when you need both long input and long output in the same request. Watch the $3.00/M output cost — that’s where this model gets expensive if you’re not careful.

TRY KIMI K2.5 ON HAIMAKER

For setup instructions, see our API key guide. For all available models, see the complete models guide.