What is the exact pricing for Kimi K2.5?

It costs $0.38 per 1 million input tokens and $1.72 per 1 million output tokens.

How large is the context window?

Kimi K2.5 supports up to 262,144 tokens for both input and output.

Does it support Hermes Agent's tool-use?

Yes, it has native function calling and vision support, making it compatible with all 47 built-in Hermes tools.

Kimi K2.5 for Hermes Agent: Pricing, Setup, and What It's Good At

Current as of April 2026. Kimi K2.5 is a 1.1T parameter model from Moonshot AI that offers a massive 262K context window for both input and output. It provides a high-capacity reasoning engine for Hermes Agent at a fraction of the cost of Western frontier models, priced at $0.38 per million input tokens.

Specs


Provider	Moonshot AI
Input cost	$0.38 / M tokens
Output cost	$1.72 / M tokens
Context window	262K tokens
Max output	262K tokens
Parameters	1.1T
Features	function_calling, vision

What it’s good at

Massive Symmetric Context

The 262K token output limit is a rarity, allowing Hermes to generate massive post-action reports or long-form documentation without truncation.

Aggressive Pricing

At $0.38 input and $1.72 output per million tokens, it is significantly cheaper than GPT-4o while maintaining high-end reasoning capabilities.

Reliable Tool Integration

Native function calling support ensures Hermes can navigate its 47 built-in tools and MCP servers with minimal syntax errors.

Where it falls short

Inference Latency

The 1.1T parameter architecture can result in slower time-to-first-token compared to smaller models like Claude 3.5 Sonnet.

Cultural Bias

Reasoning patterns sometimes lean toward Chinese linguistic structures, which can occasionally affect the tone of English-based Slack or Discord responses.

Best use cases with Hermes Agent

Long-Term Memory Management — The 262K window allows Hermes to ingest months of messaging history to maintain a consistent identity and cross-session memory.
Complex Multi-Platform Automation — It handles deep reasoning across disparate platforms like SSH, Docker, and Telegram without losing track of the execution state.

Not ideal for

Low-Latency Chatbots — If your Hermes instance needs to respond to WhatsApp messages in under a second, the overhead of this 1.1T model will be too high.
Simple Shell Scripting — Using a model this large for basic terminal commands is overkill and unnecessarily increases your token spend compared to GPT-4o-mini.

Hermes Agent setup

Configure the OpenClaw provider to use the Moonshot API base URL and set the context limit to 262144 to prevent premature truncation of the agent’s memory log.

Hermes makes custom endpoints easy. Run:

hermes model

Choose Custom endpoint from the menu. Enter the base URL and model identifier when prompted:

Base URL: https://api.moonshot.cn/v1
Model: moonshotai/kimi-k2.5

Hermes stores the selection and uses it for all subsequent agent runs across whatever platforms you have wired up (Telegram, Discord, Slack, etc.). Tune HERMES_STREAM_READ_TIMEOUT and related env vars if you’re hitting slow providers.

How it compares

vs GPT-4o — Kimi K2.5 is nearly 90% cheaper on input tokens while offering double the effective context window for long-running autonomous tasks.
vs Claude 3.5 Sonnet — Sonnet is faster and follows MCP instructions with slightly higher precision, but lacks Kimi’s massive 262K output token capacity.
vs DeepSeek-V3 — DeepSeek offers similar pricing but Kimi K2.5 generally handles long-context retrieval in persistent memory loops with fewer hallucinations.

Bottom line

Kimi K2.5 is the best value for Hermes users who need massive persistent memory and long-form output without paying the premium prices of GPT-4o.

TRY KIMI K2.5 IN HERMES

For more, see our Hermes local-LLM setup guide.