Current as of April 2026. Claude 3.5 Sonnet is the current gold standard for tool-heavy Hermes Agent deployments. It handles the 200K context window with high retrieval accuracy, making it ideal for persistent memory across multiple messaging platforms.
Specs
| Provider | Anthropic |
| Input cost | $6.00 / M tokens |
| Output cost | $30 / M tokens |
| Context window | 200K tokens |
| Max output | 8K tokens |
| Parameters | N/A |
| Features | function_calling, vision |
What it’s good at
Reliable Tool Invocation
It rarely hallucinates tool parameters when using Hermes’ 47 built-in tools or custom MCP servers.
Nuanced Instruction Following
It maintains a consistent identity and persona across disparate platforms like Discord and Slack without drifting over long sessions.
Vision-Enabled Reasoning
The native vision capability allows Hermes to process screenshots or images shared in messaging channels for better context.
Where it falls short
High Operational Cost
At $6/M input and $30/M output tokens, it is significantly more expensive than running Llama 3.1 70B or GPT-4o-mini.
Verbosity
It can be overly talkative in messaging channels, which consumes output tokens unnecessarily during long autonomous runs.
Best use cases with Hermes Agent
- Cross-Platform Automation — It excels at monitoring Slack and executing shell commands via SSH based on complex multi-step logic.
- MCP-Driven Workflows — Its strict adherence to function schemas makes it the most reliable choice for Model Context Protocol integration.
Not ideal for
- High-Frequency Simple Notifications — The $30/M output cost is too high for simple status updates that do not require complex reasoning.
- Latency-Critical Actions — While fast, it cannot match the near-instant response times of smaller models like Groq-hosted Llama 3.
Hermes Agent setup
Ensure your Anthropic API key has high rate limits because Hermes’ closed learning loop can trigger multiple calls in quick succession.
Hermes makes custom endpoints easy. Run:
hermes model
Choose Custom endpoint from the menu. Enter the base URL and model identifier when prompted:
- Base URL:
https://api.haimaker.ai/v1 - Model:
anthropic/claude-3.5-sonnet
Hermes stores the selection and uses it for all subsequent agent runs across whatever platforms you have wired up (Telegram, Discord, Slack, etc.). Tune HERMES_STREAM_READ_TIMEOUT and related env vars if you’re hitting slow providers.
How it compares
- vs GPT-4o — Sonnet 3.5 follows complex system prompts more accurately and is less prone to lazy tool execution than GPT-4o.
- vs Llama 3.1 70B — Sonnet 3.5 is proprietary but handles long-context tool use much better than current open-weight alternatives.
Bottom line
Use Sonnet 3.5 if you need a stable agent that won’t break its tool-calling logic or lose its persona during multi-day autonomous runs.
TRY CLAUDE 3.5 SONNET IN HERMES
For more, see our Hermes local-LLM setup guide.