Current as of March 2026. Kimi K2.5 is a 1.1T parameter MoE model from Moonshot AI. The stat that stands out is 262K tokens for both input and output — you can feed it a massive document set and get a similarly long response back. At $0.60/M input, that’s accessible. The $3.00/M output is where you need to budget carefully.
Specs
| Provider | Moonshot AI |
| Input cost | $0.60 / M tokens |
| Output cost | $3.00 / M tokens |
| Context window | 262K tokens |
| Max output | 262K tokens |
| Parameters | 1.1T |
| Features | function_calling, vision |
What it’s good at
262K Context + 262K Output
This combination is genuinely rare. Most models with a large context window cap output at 8K or 16K. K2.5 lets you transform or generate long artifacts from long inputs.
Input Pricing for the Parameter Count
$0.60/M is cheap for a 1.1T parameter model. You’re getting a lot of model for the input cost — the output side is where the price reflects the scale.
Multimodal
Vision and function calling are both native. Useful for OpenClaw agents that need to process screenshots alongside text or hit external APIs.
Where it falls short
Output Cost
$3.00/M output is 5x the input rate. If you’re using the full 262K output window regularly, the bill climbs fast. Budget the output side carefully.
Latency
1.1T parameters means slow inference. TTFT is high, and it doesn’t improve much under load.
API Location
The endpoint is at api.moonshot.cn. Users outside Asia will see higher latency and occasional jitter. Not ideal for time-sensitive workflows.
Best use cases with OpenClaw
- Large Document Transformation — Big input, big output, reasonable input cost. This is the core use case.
- Visual Reasoning Tasks — The parameter scale handles complex vision tasks that stumble smaller models.
Not ideal for
- Real-time Chatbots — TTFT is too high for anything interactive.
- High-Volume Simple Tasks — You’re paying for 1.1T parameters. Use a smaller model for classification or basic summarization.
Run it through Haimaker
Skip juggling API keys. One Haimaker key gives you access to every model on the platform. Tell OpenClaw:
Add Haimaker as a custom provider to my OpenClaw config. Use these details:
- Provider name: haimaker
- Base URL: https://api.haimaker.ai/v1
- API key: [PASTE YOUR HAIMAKER API KEY HERE]
- API type: openai-completions
Add the auto-router model:
- haimaker/auto (reasoning: false, context: 128000, max tokens: 32000)
Create an alias "auto" for easy switching. Apply the config when done.
Or skip model selection entirely — Haimaker’s auto-router picks the best model for each task so you don’t have to.
OpenClaw setup
You must configure a custom provider in OpenClaw pointing to https://api.moonshot.cn/v1. Ensure your timeout settings are increased to account for the model’s processing time on large context inputs.
{
"models": {
"mode": "merge",
"providers": {
"moonshotai": {
"baseUrl": "https://api.moonshot.cn/v1",
"apiKey": "YOUR-MOONSHOTAI-API-KEY",
"api": "openai-completions",
"models": [
{
"id": "kimi-k2.5",
"name": "Kimi K2.5",
"cost": {
"input": 0.6,
"output": 3
},
"contextWindow": 262144,
"maxTokens": 262144
}
]
}
}
}
}
How it compares
- vs GPT-4o-mini — 4o-mini is cheaper on output but caps at 128K context. K2.5 wins when you need 262K of either.
- vs Claude 3.5 Sonnet — Claude is better at coding and costs $3/M input. K2.5 is cheaper to read from, worse to generate with.
- vs DeepSeek-V3 — Both are strong. K2.5’s 262K output limit is the specific differentiator for long-form generation tasks.
Bottom line
Use it when you need both long input and long output in the same request. Watch the $3.00/M output cost — that’s where this model gets expensive if you’re not careful.
For setup instructions, see our API key guide. For all available models, see the complete models guide.