What is the context window size?

It supports up to 1,000,000 tokens, allowing for massive memory retention in Hermes.

How much does it cost?

Input tokens are $0.30 per million and output tokens are $2.40 per million.

Does it support function calling?

Yes, it has native function calling support for Hermes' 47+ built-in tools and MCP servers.

MiniMax M2.5 Lightning for Hermes Agent: Pricing, Setup, and What It's Good At

Current as of April 2026. MiniMax M2.5 Lightning is a high-capacity, low-cost model that provides a massive 1M token context window. It is built for developers using Hermes Agent who need to process huge amounts of platform data without paying GPT-4o prices.

Specs


Provider	MiniMax
Input cost	$0.30 / M tokens
Output cost	$2.40 / M tokens
Context window	1M tokens
Max output	8K tokens
Parameters	N/A
Features	function_calling, reasoning

What it’s good at

Massive 1M Context Window

This allows Hermes to maintain an enormous cross-session memory, effectively never needing to truncate long Slack or Discord conversations.

Aggressive Pricing

At $0.30 per million input tokens, it is significantly cheaper than Claude 3.5 Sonnet for running persistent autonomous loops.

Where it falls short

Tool Call Precision

It occasionally fails on complex MCP tool definitions compared to top-tier models, requiring very explicit system prompts for Hermes to function reliably.

Inconsistent Latency

Response times can fluctuate, which might cause timeouts in real-time messaging platforms like WhatsApp or Telegram.

Best use cases with Hermes Agent

High-Volume Platform Monitoring — The 1M context window excels at ingesting thousands of messages from multiple channels to identify patterns or trigger shell commands.
Persistent Memory Agents — Low input costs make it viable to keep the entire agent history in-context, ensuring Hermes maintains a consistent identity over weeks of operation.

Not ideal for

Zero-Latency Chatbots — The ‘Lightning’ name is relative; it often takes longer to first-token than GPT-4o-mini, making it feel sluggish in fast-paced Discord threads.
Complex Nested Tool Use — If your Hermes setup relies on deeply nested MCP tool calls, this model may hallucinate arguments more frequently than specialized reasoning models.

Hermes Agent setup

Configure the MiniMax provider in your Hermes config and set the context limit to 1,000,000. Use the international endpoint to ensure the best routing for your local or Docker-based agent instance.

Hermes makes custom endpoints easy. Run:

hermes model

Choose Custom endpoint from the menu. Enter the base URL and model identifier when prompted:

Base URL: https://api.haimaker.ai/v1
Model: minimax/MiniMax-M2.5-lightning

Hermes stores the selection and uses it for all subsequent agent runs across whatever platforms you have wired up (Telegram, Discord, Slack, etc.). Tune HERMES_STREAM_READ_TIMEOUT and related env vars if you’re hitting slow providers.

How it compares

vs GPT-4o-mini — M2.5 Lightning offers a much larger context (1M vs 128k) for similar pricing, though GPT-4o-mini has more reliable tool calling.
vs Claude 3.5 Haiku — Haiku is faster and better at following complex instructions, but costs significantly more and lacks the 1M token depth for long-term memory.

Bottom line

MiniMax M2.5 Lightning is the best choice for Hermes users who prioritize a massive 1M token memory and low operating costs over absolute reasoning speed.

TRY MINIMAX M2.5 LIGHTNING IN HERMES

For more, see our Hermes local-LLM setup guide.