Hermes Agent from Nous Research is free. It’s open source, there’s no seat fee, and nobody charges you per run. So when people ask what Hermes “costs,” they’re really asking about the model behind it — and that bill ranges from a few dollars a month to four figures, depending on which model you point it at and how hard you work it.
Hermes is free — you pay for the model
Install Hermes, configure a model endpoint, done. The bill that shows up is from whatever provider serves that endpoint — Anthropic, OpenAI, Google, MiniMax, DeepSeek, Z.ai, Moonshot, or an aggregator. Hermes itself takes no cut.
That also means there’s no “Hermes plan” to choose. Your pricing decision is a model decision.
Typical monthly bills by use case
Real ballparks, assuming normal usage patterns:
| Use case | What it looks like | Budget model (MiniMax M2.5) | Mid-tier (Claude Sonnet 4.6) | Flagship (GPT-5.4 / Opus 4.6) |
|---|---|---|---|---|
| Light assistant | A few message-driven tasks a day, some reminders and lookups | $2–$8 / mo | $20–$60 / mo | $80–$250 / mo |
| Daily coding agent | Regular multi-file edits, debugging, code review through the day | $8–$25 / mo | $50–$200 / mo | $300–$900 / mo |
| Heavy autonomous | Long overnight runs, monitoring, multi-step automations across platforms | $25–$80 / mo | $150–$500 / mo | $500–$1,500+ / mo |
The spread between columns is the whole story: the model you choose moves your bill by 10–50x for roughly the same work. That’s why the rest of this guide is mostly about picking models, not about Hermes.
Per-provider model pricing
Approximate API pricing per million tokens (input / output). Providers change these regularly — check current rates before you commit.
| Model | Input / Output | Notes |
|---|---|---|
| MiniMax M2.5 | ~$0.12 / $1 | Cheapest model that still holds up in Hermes’ tool loops |
| DeepSeek V3.2 | ~$0.27 / M | Low-cost coding and reasoning fallback |
| GLM-4.7 / GLM-5 | sub-dollar | Cheap general-purpose agent work |
| Kimi K2.5 | cheap | Large context, good for long chats |
| Gemini 3 Flash | ~$0.075 / $0.30 | Very cheap, fast, fine for simple high-volume tasks |
| Gemini 3.1 Pro | ~$1.25 / $10 | 1M+ context for research and codebase Q&A |
| Claude Sonnet 4.6 | $3 / $15 | The reliable default for autonomous loops |
| Claude Opus 4.6 | ~$5 / $25 | Mission-critical work where errors are expensive |
| GPT-5.4 Codex | premium tier | Heavy multi-file coding |
| Gemma 4 8B / Qwen3.5 (Ollama) | $0 per token | Local — you pay for hardware and electricity, not tokens |
Through haimaker.ai the same models run about 5% below market rate on one API key, which also saves you from holding a separate billing account with each provider.
Where the savings actually come from
Three levers, in order of impact:
- Route most traffic to a budget model. Run MiniMax M2.5 or DeepSeek V3.2 as your Hermes default and only override to Sonnet, Opus, or a Codex model for tasks that genuinely need it. This one change is where 60–90% of a typical bill goes.
- Trim the context. Hermes keeps tool outputs and file contents in the prompt. Don’t load files the agent doesn’t need, and prune long histories — every token in the window is a token you pay for on the next turn.
- Avoid output-heavy flagships for routine work. Output tokens cost 5x input on Claude and more on the pro tiers. A chatty Opus run gets expensive fast; the same task on a cheaper model with similar quality on simple work costs a fraction.
A note on subscriptions
Hermes does not work with a Claude Max, ChatGPT Plus, or Gemini Advanced consumer subscription — those are chat-app plans and don’t include API access. You need an API key. The good news is API billing is pay-as-you-go: a light Hermes user often spends less per month than a $20 chat subscription would cost, because you only pay for the tokens you actually run.
Set up haimaker.ai with Hermes Agent
If you’d rather not open billing accounts with five different model providers — and you want the cheapest routing per token — point Hermes at haimaker.ai once and switch models by changing a single string.
-
Create an account and get an API key at app.haimaker.ai. New accounts start with free credits, so you can test costs before committing.
-
In your terminal, run:
hermes model -
Choose Custom endpoint.
-
Enter the connection details:
- Base URL:
https://api.haimaker.ai/v1 - API key: your haimaker.ai key
- Model: start cheap with
minimax/minimax-m2-5ordeepseek/deepseek-v3-2; switch toanthropic/claude-sonnet-4-6oropenai/gpt-5-4-codexfor heavier work
- Base URL:
-
Run
hermes. To change models later — say, to drop your bill — runhermes modelagain and swap the model string; the key and base URL stay put.
Want to compare exact pricing and benchmarks across models before you pick one? They’re all side by side at haimaker.ai.
GET $10 FREE CREDITS ON HAIMAKER
Related: Best Models for Hermes Agent · How to add a custom provider to Hermes Agent · Hermes Agent vs Codex CLI