Current as of April 2026. DeepSeek has become the pragmatic choice for running Hermes Agent instances at scale. These models provide the tool-calling reliability needed for 47+ built-in tools while keeping operational costs low enough to run persistent, multi-platform agents on Discord or Slack 24/7.
The quick answer
| Model | Input / Output | Context | Best For |
|---|---|---|---|
| DeepSeek V3.1 | $0.15 / $0.75 | 33K | The budget entry-point for stateless bots |
| DeepSeek V3.2 | $0.26 / $0.38 | 164K | The primary choice for long-running autonomous workflows |
| DeepSeek V3 | $0.32 / $0.89 | 164K | The budget entry-point for stateless bots |
| DeepSeek R1 | $0.70 / $2.50 | 64K | The logic engine for multi-tool orchestration |
Start with DeepSeek V3.2 unless you have a specific reason to pick another. It is the most balanced model for autonomous workflows. It provides a massive 164K context window for persistent memory and costs only $0.26/M input and $0.38/M output, making it cheaper to operate than the older V3 while handling longer agent sessions.
DeepSeek V3.1 — The budget entry-point for stateless bots
At $0.15/M input, this is the cheapest way to connect Hermes to a messaging platform. The 33K context limit is a significant bottleneck for agents using persistent cross-session memory, so reserve this for simple, ephemeral tasks where the agent doesn’t need to recall long conversation histories.
DeepSeek V3.2 — The primary choice for long-running autonomous workflows
This model is the sweet spot for Hermes. The 164K context window allows the agent to maintain deep memory across multiple messaging sessions. Its output pricing of $0.38/M is nearly half the cost of V3.1, making it more economical for agents that generate long, tool-heavy responses.
DeepSeek V3 — The budget entry-point for stateless bots
At $0.15/M input, this is the cheapest way to connect Hermes to a messaging platform. The 33K context limit is a significant bottleneck for agents using persistent cross-session memory, so reserve this for simple, ephemeral tasks where the agent doesn’t need to recall long conversation histories.
DeepSeek R1 — The logic engine for multi-tool orchestration
When Hermes needs to coordinate between multiple MCP servers or handle complex reasoning before acting, R1 is the only choice. It is the most expensive at $0.7/M input, but it avoids the logic loops that cheaper models fall into during long-running, autonomous tasks.
Setup in Hermes Agent
To integrate DeepSeek, run ‘hermes model’ and select ‘Custom endpoint’. Use your provider’s base URL (e.g., https://api.deepseek.com/v1) and enter the specific model identifier. Ensure your API key has sufficient credits, as DeepSeek’s low pricing often leads to high-volume usage in autonomous loops.
Running through haimaker.ai
Rather than standing up a per-provider account, you can point Hermes at haimaker.ai and get access to DeepSeek alongside every other frontier model through one API key:
- Base URL:
https://api.haimaker.ai/v1 - Model:
deepseek/deepseek-chat-v3.1
Direct provider setup
Hermes makes custom endpoints easy. Run:
hermes model
Choose Custom endpoint from the menu. Enter the base URL and model identifier when prompted:
- Base URL:
https://api.deepseek.com/v1 - Model:
deepseek/deepseek-chat-v3.1
Hermes stores the selection and uses it for all subsequent agent runs. You can also set HERMES_STREAM_READ_TIMEOUT and related env vars if you’re hitting slow providers.
Bottom line
For a production-ready Hermes Agent, use DeepSeek V3.2 for daily operations and swap to R1 only when the agent encounters complex reasoning tasks that require deep chain-of-thought processing.
RUN DEEPSEEK IN HERMES WITH HAIMAKER
See our Hermes local-LLM setup guide.