Current as of April 2026. Qwen3 235B A22B is a heavy-hitter for Hermes Agent, offering a massive 262K context window and aggressive pricing at $0.07/$0.1 per million tokens. It is built for developers who need deep reasoning and long-term memory persistence across 15+ messaging platforms.
Specs
| Provider | Qwen (Alibaba) |
| Input cost | $0.07 / M tokens |
| Output cost | $0.10 / M tokens |
| Context window | 262K tokens |
| Max output | 8K tokens |
| Parameters | N/A |
| Features | function_calling, reasoning |
What it’s good at
Tool-Use Reliability
It handles the 47 built-in Hermes tools with high precision, rarely failing JSON schema validation during complex autonomous loops.
Persistent Memory Capacity
The 262K context window allows the agent to maintain a coherent identity and memory across weeks of Slack and Discord interactions.
Multilingual Reasoning
Superior performance in CJK languages makes it the strongest candidate for Hermes deployments in international or multilingual environments.
Where it falls short
Output Bottlenecks
The 8K output limit can truncate complex summaries when the agent is synthesizing data from multiple MCP sources.
Inference Latency
Response times are slower than smaller models, which can lead to noticeable delays in fast-paced Telegram or WhatsApp threads.
Best use cases with Hermes Agent
- Cross-Platform Monitoring — It effectively monitors Slack channels to trigger shell commands and report results back to Discord while maintaining context.
- Complex MCP Integration — The reasoning capabilities ensure the model correctly maps local data from MCP servers to autonomous agent actions.
Not ideal for
- Instant Chatbots — The latency is too high for simple conversational bots that don’t require the model’s heavy reasoning features.
- Low-Budget Tasks — While cheap for its size, smaller models are more cost-effective for tasks that don’t leverage the 262K context window.
Hermes Agent setup
Enable the reasoning feature in your Hermes configuration to allow the model to utilize its internal chain-of-thought before executing tool calls.
Hermes makes custom endpoints easy. Run:
hermes model
Choose Custom endpoint from the menu. Enter the base URL and model identifier when prompted:
- Base URL:
https://api.haimaker.ai/v1 - Model:
qwen/qwen3-235b-a22b
Hermes stores the selection and uses it for all subsequent agent runs across whatever platforms you have wired up (Telegram, Discord, Slack, etc.). Tune HERMES_STREAM_READ_TIMEOUT and related env vars if you’re hitting slow providers.
How it compares
- vs Llama 3.1 405B — Llama is more expensive and has a smaller context window, making Qwen3 better for persistent memory-heavy agents.
- vs DeepSeek-V3 — DeepSeek is competitive on price, but Qwen3’s 262K context window provides a significant advantage for long-running autonomous sessions.
Bottom line
For Hermes Agent users who need massive context and reliable tool execution across platforms without the cost of proprietary Western models, Qwen3 235B is the top choice.
For more, see our Hermes local-LLM setup guide.