What are the exact API costs?

Input tokens are priced at $0.30 per million and output tokens cost $1.20 per million.

What is the context limit for Hermes memory?

The model supports a 66K token context window and a 2K token maximum output per call.

Yes, it works with the Model Context Protocol, though performance degrades as the number of connected tools increases.

MiniMax M2-her for Hermes Agent: Pricing, Setup, and What It's Good At

Current as of April 2026. MiniMax M2-her is a specialized budget model designed for high-frequency automation within Hermes Agent. It offers a low-latency alternative for developers who need to bridge messaging platforms like Slack and Telegram without the high overhead of flagship models.

Specs


Provider	MiniMax
Input cost	$0.30 / M tokens
Output cost	$1.20 / M tokens
Context window	66K tokens
Max output	2K tokens
Parameters	N/A
Features	Standard chat

What it’s good at

Aggressive Pricing

At $0.30 per million input tokens, this model is specifically optimized for high-volume polling and message monitoring tasks.

Reliable Tool Formatting

It maintains consistent JSON structure when invoking the 47 built-in Hermes tools, particularly for shell commands and file system operations.

Where it falls short

Restrictive Context Window

The 66K token limit is tight for Hermes’ closed learning loop, often requiring aggressive memory pruning during long-running autonomous sessions.

Output Truncation

A 2K max output limit prevents the model from generating long system logs or detailed summaries from complex MCP tool outputs.

Best use cases with Hermes Agent

Cross-Platform Notification Routing — It excels at monitoring Discord or Slack channels and using shell tools to trigger system alerts based on specific triggers.
Simple Shell Automation — The model is reliable for executing basic bash scripts and file management tasks where the logic is straightforward and context requirements are low.

Not ideal for

Multi-Session Memory Retention — The 66K context window quickly fills up when Hermes attempts to maintain a persistent identity across hundreds of platform interactions.
Complex MCP Orchestration — When connecting multiple MCP servers, the model struggles to maintain the reasoning chain across several distinct tool definitions.

Hermes Agent setup

Use the standard OpenAI-compatible endpoint configuration but monitor for specific rate limits associated with the M2 tier to avoid tool-call failures during autonomous loops.

Hermes makes custom endpoints easy. Run:

hermes model

Choose Custom endpoint from the menu. Enter the base URL and model identifier when prompted:

Base URL: https://api.haimaker.ai/v1
Model: minimax/minimax-m2-her

Hermes stores the selection and uses it for all subsequent agent runs across whatever platforms you have wired up (Telegram, Discord, Slack, etc.). Tune HERMES_STREAM_READ_TIMEOUT and related env vars if you’re hitting slow providers.

How it compares

vs GPT-4o-mini — GPT-4o-mini provides a 128K context window for a similar price, making it superior for Hermes instances that require deeper historical memory.
vs Gemini 1.5 Flash — Gemini offers a massive 1M context window for long-term reasoning, though M2-her can be more predictable with specific shell-tool syntax.

Bottom line

M2-her is a solid choice for developers running high-traffic, simple automation bots where cost-efficiency outweighs the need for massive context depth.

TRY MINIMAX M2-HER IN HERMES

For more, see our Hermes local-LLM setup guide.