What is the token pricing for Haiku 4.5?

Input tokens cost $1 per million and output tokens cost $5 per million.

Does it support the full Hermes toolset?

Yes, it fully supports function calling for all 47 built-in tools and external MCP servers.

What is the maximum output length?

The model supports up to 64,000 output tokens per request, which is excellent for generating long system logs or session summaries.

Claude Haiku 4.5 for Hermes Agent: Pricing, Setup, and What It's Good At

Current as of April 2026. Claude Haiku 4.5 is the workhorse model for Hermes Agent users who need high-velocity tool execution without the price tag of flagship models. It balances a massive 200K context window with low latency, making it ideal for managing persistent identities across high-traffic messaging platforms.

Specs


Provider	Anthropic
Input cost	$1.00 / M tokens
Output cost	$5.00 / M tokens
Context window	200K tokens
Max output	64K tokens
Parameters	N/A
Features	function_calling, vision, reasoning

What it’s good at

Reliable Tool Calling

It follows tool schemas for Hermes’s 47+ built-in tools with fewer hallucinations than other models in the $1 per million token price bracket.

Vision Integration

The native vision capabilities allow the agent to process screenshots from Discord or Slack and trigger shell commands based on visual UI changes.

Context Retention

With a 200K context window, it maintains a coherent persistent memory across long-running autonomous sessions without losing the user’s specific persona.

Where it falls short

Complex Reasoning Depth

It can struggle with multi-step logic chains that require coordinating more than five different tools in a single autonomous loop.

Cost-to-Intelligence Ratio

At $1/$5 per million tokens, it is significantly more expensive than GPT-4o-mini, which offers comparable performance for basic routing tasks.

Best use cases with Hermes Agent

Multi-Platform Message Triage — Its speed and 64K max output allow it to summarize and route messages across 15+ platforms like Telegram and WhatsApp in real-time.
Autonomous Shell Operations — The model’s strict instruction following makes it safe for running CLI tools and managing Docker containers via Hermes.

Not ideal for

High-Stakes Financial Data Analysis — The smaller parameter size compared to Sonnet models leads to occasional precision errors when processing complex numerical data from tools.
Deep Strategic Planning — Long-term autonomous runs requiring complex branching logic often benefit from the higher-tier reasoning found in the Opus or Sonnet lines.

Hermes Agent setup

Configure the provider as Anthropic and use the exact ID anthropic/claude-haiku-4-5. Ensure your max_tokens is set to 64000 to leverage the full output capacity for long memory summaries.

Hermes makes custom endpoints easy. Run:

hermes model

Choose Custom endpoint from the menu. Enter the base URL and model identifier when prompted:

Base URL: https://api.haimaker.ai/v1
Model: anthropic/claude-haiku-4-5

Hermes stores the selection and uses it for all subsequent agent runs across whatever platforms you have wired up (Telegram, Discord, Slack, etc.). Tune HERMES_STREAM_READ_TIMEOUT and related env vars if you’re hitting slow providers.

How it compares

vs GPT-4o-mini — GPT-4o-mini is cheaper at $0.15/$0.60 per million tokens, but Haiku 4.5 provides more reliable tool calling and a larger 200K context window.
vs Gemini 1.5 Flash — Gemini 1.5 Flash offers a 1M context window, yet Haiku 4.5 exhibits superior adherence to the Hermes Agent identity and closed learning loop requirements.

Bottom line

Haiku 4.5 is the best choice for Hermes Agent users who prioritize speed and tool-use reliability over the raw reasoning power of more expensive flagship models.

TRY CLAUDE HAIKU 4.5 IN HERMES

For more, see our Hermes local-LLM setup guide.