Hermes Agent from Nous Research is free. It’s open source, there’s no seat fee, and nobody charges you per run. So when people ask what Hermes “costs,” they’re really asking about the model behind it — and that bill ranges from a few dollars a month to four figures, depending on which model you point it at and how hard you work it.

Hermes is free — you pay for the model

Install Hermes, configure a model endpoint, done. The bill that shows up is from whatever provider serves that endpoint — Anthropic, OpenAI, Google, MiniMax, DeepSeek, Z.ai, Moonshot, or an aggregator. Hermes itself takes no cut.

That also means there’s no “Hermes plan” to choose. Your pricing decision is a model decision.

Typical monthly bills by use case

Real ballparks, assuming normal usage patterns:

Use caseWhat it looks likeBudget model (MiniMax M2.5)Mid-tier (Claude Sonnet 4.6)Flagship (GPT-5.4 / Opus 4.6)
Light assistantA few message-driven tasks a day, some reminders and lookups$2–$8 / mo$20–$60 / mo$80–$250 / mo
Daily coding agentRegular multi-file edits, debugging, code review through the day$8–$25 / mo$50–$200 / mo$300–$900 / mo
Heavy autonomousLong overnight runs, monitoring, multi-step automations across platforms$25–$80 / mo$150–$500 / mo$500–$1,500+ / mo

The spread between columns is the whole story: the model you choose moves your bill by 10–50x for roughly the same work. That’s why the rest of this guide is mostly about picking models, not about Hermes.

Per-provider model pricing

Approximate API pricing per million tokens (input / output). Providers change these regularly — check current rates before you commit.

ModelInput / OutputNotes
MiniMax M2.5~$0.12 / $1Cheapest model that still holds up in Hermes’ tool loops
DeepSeek V3.2~$0.27 / MLow-cost coding and reasoning fallback
GLM-4.7 / GLM-5sub-dollarCheap general-purpose agent work
Kimi K2.5cheapLarge context, good for long chats
Gemini 3 Flash~$0.075 / $0.30Very cheap, fast, fine for simple high-volume tasks
Gemini 3.1 Pro~$1.25 / $101M+ context for research and codebase Q&A
Claude Sonnet 4.6$3 / $15The reliable default for autonomous loops
Claude Opus 4.6~$5 / $25Mission-critical work where errors are expensive
GPT-5.4 Codexpremium tierHeavy multi-file coding
Gemma 4 8B / Qwen3.5 (Ollama)$0 per tokenLocal — you pay for hardware and electricity, not tokens

Through haimaker.ai the same models run about 5% below market rate on one API key, which also saves you from holding a separate billing account with each provider.

Where the savings actually come from

Three levers, in order of impact:

  1. Route most traffic to a budget model. Run MiniMax M2.5 or DeepSeek V3.2 as your Hermes default and only override to Sonnet, Opus, or a Codex model for tasks that genuinely need it. This one change is where 60–90% of a typical bill goes.
  2. Trim the context. Hermes keeps tool outputs and file contents in the prompt. Don’t load files the agent doesn’t need, and prune long histories — every token in the window is a token you pay for on the next turn.
  3. Avoid output-heavy flagships for routine work. Output tokens cost 5x input on Claude and more on the pro tiers. A chatty Opus run gets expensive fast; the same task on a cheaper model with similar quality on simple work costs a fraction.

A note on subscriptions

Hermes does not work with a Claude Max, ChatGPT Plus, or Gemini Advanced consumer subscription — those are chat-app plans and don’t include API access. You need an API key. The good news is API billing is pay-as-you-go: a light Hermes user often spends less per month than a $20 chat subscription would cost, because you only pay for the tokens you actually run.

Set up haimaker.ai with Hermes Agent

If you’d rather not open billing accounts with five different model providers — and you want the cheapest routing per token — point Hermes at haimaker.ai once and switch models by changing a single string.

  1. Create an account and get an API key at app.haimaker.ai. New accounts start with free credits, so you can test costs before committing.

  2. In your terminal, run:

    hermes model
    
  3. Choose Custom endpoint.

  4. Enter the connection details:

    • Base URL: https://api.haimaker.ai/v1
    • API key: your haimaker.ai key
    • Model: start cheap with minimax/minimax-m2-5 or deepseek/deepseek-v3-2; switch to anthropic/claude-sonnet-4-6 or openai/gpt-5-4-codex for heavier work
  5. Run hermes. To change models later — say, to drop your bill — run hermes model again and swap the model string; the key and base URL stay put.

Want to compare exact pricing and benchmarks across models before you pick one? They’re all side by side at haimaker.ai.

GET $10 FREE CREDITS ON HAIMAKER


Related: Best Models for Hermes Agent · How to add a custom provider to Hermes Agent · Hermes Agent vs Codex CLI