What is the pricing for Sonnet 4.6?

It costs $3 per million input tokens and $15 per million output tokens.

How large is the context window?

The model supports a 1M token context window with a max output of 128K tokens.

Does it work well with MCP?

Yes, it is one of the most stable models for Model Context Protocol handling, which is vital for Hermes' external tool integrations.

Claude Sonnet 4.6 for Hermes Agent: Pricing, Setup, and What It's Good At

Current as of April 2026. Sonnet 4.6 is the current gold standard for Hermes Agent users who prioritize tool reliability and long-term memory over raw speed. It hits a sweet spot between the massive 1M context window and the precision required for complex autonomous workflows across multiple platforms.

Specs


Provider	Anthropic
Input cost	$3.00 / M tokens
Output cost	$15 / M tokens
Context window	1M tokens
Max output	128K tokens
Parameters	N/A
Features	function_calling, vision, reasoning, web_search

What it’s good at

Reliable Tool Use

It consistently formats JSON for Hermes’ 47 built-in tools without the syntax errors common in smaller models. This reliability is critical when the agent is executing shell commands or managing SSH sessions autonomously.

Deep Context Retention

The 1M token context window allows Hermes to maintain a persistent identity and remember complex interactions across Discord, Slack, and Telegram for weeks. You won’t see the agent ‘forgetting’ its objective mid-run.

Nuanced Instruction Following

It adheres strictly to system prompts, ensuring the agent maintains its specific persona and operational constraints even during long, multi-turn conversations.

Where it falls short

Output Latency

Response times are noticeably slower than ‘Flash’ class models, which can make real-time messaging on WhatsApp or Discord feel sluggish. Expect a few seconds of ‘typing’ before the agent act.

Refusal Tendencies

Anthropic’s safety filters can be overzealous, occasionally causing the agent to refuse valid shell commands or file operations if they look remotely suspicious. This can break autonomous loops.

Best use cases with Hermes Agent

Cross-Platform Automation — It excels at monitoring a Slack channel and correctly translating those requests into complex actions across Docker or SSH environments.
Long-Running Research Tasks — The combination of web search and 1M context makes it perfect for agents that need to compile data over several days without losing the thread.

Not ideal for

High-Volume Simple Chat — At $15 per million output tokens, using Sonnet 4.6 for basic Q&A on messaging apps is a waste of money compared to cheaper alternatives.
Instant-Response Triggers — If your Hermes setup needs to react to a system alert in under a second, the latency of this model will likely be a bottleneck.

Hermes Agent setup

Ensure your Anthropic API key is configured with high rate limits, as Hermes can burn through tokens quickly when performing multi-step tool calls. Set the max_tokens to at least 4096 to prevent the agent from cutting off its reasoning mid-action.

Hermes makes custom endpoints easy. Run:

hermes model

Choose Custom endpoint from the menu. Enter the base URL and model identifier when prompted:

Base URL: https://api.haimaker.ai/v1
Model: anthropic/claude-sonnet-4.6

Hermes stores the selection and uses it for all subsequent agent runs across whatever platforms you have wired up (Telegram, Discord, Slack, etc.). Tune HERMES_STREAM_READ_TIMEOUT and related env vars if you’re hitting slow providers.

How it compares

vs GPT-4o — Sonnet 4.6 is more reliable at following complex system instructions for Hermes’ identity, while GPT-4o is slightly faster for vision-based tasks.
vs Gemini 1.5 Pro — Gemini offers a larger 2M context window, but Sonnet 4.6 is significantly better at correctly calling Hermes’ built-in tools without hallucinating parameters.

Bottom line

If you are building a serious autonomous agent that needs to stay ‘sane’ and functional over long periods, Sonnet 4.6 is the most dependable model despite the premium price and moderate speed.

TRY CLAUDE SONNET 4.6 IN HERMES

For more, see our Hermes local-LLM setup guide.