What are the token costs for Minimax M2.5?

Input tokens are priced at $0.12 per million, and output tokens are $1 per million.

What is the context window size?

The model supports a 197K token context window and a 197K token maximum output limit.

Minimax M2.5 for Hermes Agent: Pricing, Setup, and What It's Good At

Current as of April 2026. Minimax M2.5 is a high-value alternative for Hermes Agent deployments that require massive context handling on a budget. At $0.12 per million input tokens, it provides a 197K context window that is essential for agents maintaining long-term memory across 15+ messaging platforms.

Specs


Provider	MiniMax
Input cost	$0.12 / M tokens
Output cost	$1.00 / M tokens
Context window	197K tokens
Max output	197K tokens
Parameters	N/A
Features	function_calling

What it’s good at

Robust Tool Execution

The model handles Hermes’s 47 built-in tools with high precision, showing fewer hallucinations during complex MCP protocol sequences than other models in this price tier.

Deep Output Buffer

A 197K max output limit allows the agent to generate exhaustive multi-step autonomous plans and detailed logs without hitting truncation limits mid-run.

Where it falls short

Geographic Latency

Users running Hermes on Western-based Modal or Docker instances will experience higher Time To First Token (TTFT) due to the provider’s infrastructure location.

Persona Consistency

During long autonomous sessions spanning multiple platforms like Discord and Slack, the model can occasionally lose its specific agent identity compared to top-tier frontier models.

Best use cases with Hermes Agent

Cross-Platform Monitoring — The low $0.12/$1 pricing makes it cost-effective to keep the agent active 24/7, ingesting streams from Telegram and Slack to trigger shell commands.
Large-Scale Memory Retrieval — With 197K tokens, Hermes can maintain a massive persistent memory of past user interactions and tool results, enabling a more effective closed learning loop.

Not ideal for

Instant Messaging Reply Speed — The network latency makes it less ideal for users who need the agent to respond to WhatsApp or Discord messages in under a second.
High-Stakes SSH Automation — While tool-use is reliable, it lacks the extreme reasoning precision of models like Claude 3.5 Sonnet when executing destructive shell commands.

Hermes Agent setup

Configure the provider as MiniMax and use the full model ID minimax/minimax-m2.5. Ensure your environment variables for the API key are set correctly, as the model will fail silently if the authentication header is malformed.

Hermes makes custom endpoints easy. Run:

hermes model

Choose Custom endpoint from the menu. Enter the base URL and model identifier when prompted:

Base URL: https://api.haimaker.ai/v1
Model: minimax/minimax-m2.5

Hermes stores the selection and uses it for all subsequent agent runs across whatever platforms you have wired up (Telegram, Discord, Slack, etc.). Tune HERMES_STREAM_READ_TIMEOUT and related env vars if you’re hitting slow providers.

How it compares

vs GPT-4o-mini — M2.5 is cheaper on input ($0.12 vs $0.15) and offers a vastly larger output window for complex autonomous planning.
vs Claude 3 Haiku — Haiku has lower latency for chat, but M2.5 provides much more context for agents that need to remember weeks of conversation history.

Bottom line

If you are building a high-volume Hermes Agent that needs to monitor multiple platforms and manage a large memory bank without a massive bill, M2.5 is the most logical choice.

TRY MINIMAX M2.5 IN HERMES

For more, see our Hermes local-LLM setup guide.