Current as of April 2026. Kimi K2.5 is a 1.1T parameter model from Moonshot AI that offers a massive 262K context window for both input and output. It provides a high-capacity reasoning engine for Hermes Agent at a fraction of the cost of Western frontier models, priced at $0.38 per million input tokens.

Specs

ProviderMoonshot AI
Input cost$0.38 / M tokens
Output cost$1.72 / M tokens
Context window262K tokens
Max output262K tokens
Parameters1.1T
Featuresfunction_calling, vision

What it’s good at

Massive Symmetric Context

The 262K token output limit is a rarity, allowing Hermes to generate massive post-action reports or long-form documentation without truncation.

Aggressive Pricing

At $0.38 input and $1.72 output per million tokens, it is significantly cheaper than GPT-4o while maintaining high-end reasoning capabilities.

Reliable Tool Integration

Native function calling support ensures Hermes can navigate its 47 built-in tools and MCP servers with minimal syntax errors.

Where it falls short

Inference Latency

The 1.1T parameter architecture can result in slower time-to-first-token compared to smaller models like Claude 3.5 Sonnet.

Cultural Bias

Reasoning patterns sometimes lean toward Chinese linguistic structures, which can occasionally affect the tone of English-based Slack or Discord responses.

Best use cases with Hermes Agent

  • Long-Term Memory Management — The 262K window allows Hermes to ingest months of messaging history to maintain a consistent identity and cross-session memory.
  • Complex Multi-Platform Automation — It handles deep reasoning across disparate platforms like SSH, Docker, and Telegram without losing track of the execution state.

Not ideal for

  • Low-Latency Chatbots — If your Hermes instance needs to respond to WhatsApp messages in under a second, the overhead of this 1.1T model will be too high.
  • Simple Shell Scripting — Using a model this large for basic terminal commands is overkill and unnecessarily increases your token spend compared to GPT-4o-mini.

Hermes Agent setup

Configure the OpenClaw provider to use the Moonshot API base URL and set the context limit to 262144 to prevent premature truncation of the agent’s memory log.

Hermes makes custom endpoints easy. Run:

hermes model

Choose Custom endpoint from the menu. Enter the base URL and model identifier when prompted:

  • Base URL: https://api.moonshot.cn/v1
  • Model: moonshotai/kimi-k2.5

Hermes stores the selection and uses it for all subsequent agent runs across whatever platforms you have wired up (Telegram, Discord, Slack, etc.). Tune HERMES_STREAM_READ_TIMEOUT and related env vars if you’re hitting slow providers.

How it compares

  • vs GPT-4o — Kimi K2.5 is nearly 90% cheaper on input tokens while offering double the effective context window for long-running autonomous tasks.
  • vs Claude 3.5 Sonnet — Sonnet is faster and follows MCP instructions with slightly higher precision, but lacks Kimi’s massive 262K output token capacity.
  • vs DeepSeek-V3 — DeepSeek offers similar pricing but Kimi K2.5 generally handles long-context retrieval in persistent memory loops with fewer hallucinations.

Bottom line

Kimi K2.5 is the best value for Hermes users who need massive persistent memory and long-form output without paying the premium prices of GPT-4o.

TRY KIMI K2.5 IN HERMES


For more, see our Hermes local-LLM setup guide.