Current as of April 2026. Kimi K2.5 is a 1.1T parameter model from Moonshot AI that offers a massive 262K context window for both input and output. It provides a high-capacity reasoning engine for Hermes Agent at a fraction of the cost of Western frontier models, priced at $0.38 per million input tokens.
Specs
| Provider | Moonshot AI |
| Input cost | $0.38 / M tokens |
| Output cost | $1.72 / M tokens |
| Context window | 262K tokens |
| Max output | 262K tokens |
| Parameters | 1.1T |
| Features | function_calling, vision |
What it’s good at
Massive Symmetric Context
The 262K token output limit is a rarity, allowing Hermes to generate massive post-action reports or long-form documentation without truncation.
Aggressive Pricing
At $0.38 input and $1.72 output per million tokens, it is significantly cheaper than GPT-4o while maintaining high-end reasoning capabilities.
Reliable Tool Integration
Native function calling support ensures Hermes can navigate its 47 built-in tools and MCP servers with minimal syntax errors.
Where it falls short
Inference Latency
The 1.1T parameter architecture can result in slower time-to-first-token compared to smaller models like Claude 3.5 Sonnet.
Cultural Bias
Reasoning patterns sometimes lean toward Chinese linguistic structures, which can occasionally affect the tone of English-based Slack or Discord responses.
Best use cases with Hermes Agent
- Long-Term Memory Management — The 262K window allows Hermes to ingest months of messaging history to maintain a consistent identity and cross-session memory.
- Complex Multi-Platform Automation — It handles deep reasoning across disparate platforms like SSH, Docker, and Telegram without losing track of the execution state.
Not ideal for
- Low-Latency Chatbots — If your Hermes instance needs to respond to WhatsApp messages in under a second, the overhead of this 1.1T model will be too high.
- Simple Shell Scripting — Using a model this large for basic terminal commands is overkill and unnecessarily increases your token spend compared to GPT-4o-mini.
Hermes Agent setup
Configure the OpenClaw provider to use the Moonshot API base URL and set the context limit to 262144 to prevent premature truncation of the agent’s memory log.
Hermes makes custom endpoints easy. Run:
hermes model
Choose Custom endpoint from the menu. Enter the base URL and model identifier when prompted:
- Base URL:
https://api.moonshot.cn/v1 - Model:
moonshotai/kimi-k2.5
Hermes stores the selection and uses it for all subsequent agent runs across whatever platforms you have wired up (Telegram, Discord, Slack, etc.). Tune HERMES_STREAM_READ_TIMEOUT and related env vars if you’re hitting slow providers.
How it compares
- vs GPT-4o — Kimi K2.5 is nearly 90% cheaper on input tokens while offering double the effective context window for long-running autonomous tasks.
- vs Claude 3.5 Sonnet — Sonnet is faster and follows MCP instructions with slightly higher precision, but lacks Kimi’s massive 262K output token capacity.
- vs DeepSeek-V3 — DeepSeek offers similar pricing but Kimi K2.5 generally handles long-context retrieval in persistent memory loops with fewer hallucinations.
Bottom line
Kimi K2.5 is the best value for Hermes users who need massive persistent memory and long-form output without paying the premium prices of GPT-4o.
For more, see our Hermes local-LLM setup guide.