Current as of April 2026. Qwen models have become the backbone for autonomous agents like Hermes because they punch above their weight in tool-calling reliability and multilingual support. While the Coder variants are famous for logic, the 3.5 series brings vision and massive context windows that are essential for tracking long-running conversations across Discord or Telegram.
The quick answer
| Model | Input / Output | Context | Best For |
|---|---|---|---|
| Qwen3 235B A22B | $0.07 / $0.10 | 262K | High-Volume Background Processing |
| Qwen3.5-Flash | $0.07 / $0.26 | 1M | The Context Champion |
| Qwen3.5-35B-A3B | $0.16 / $1.30 | 262K | Efficient Tool Orchestrator |
| Qwen3.5-27B | $0.20 / $1.56 | 262K | Dense Model Stability |
| Qwen3 Coder | $0.22 / $1.00 | 262K | Logic-First Automation |
| Qwen3.5-122B-A10B | $0.26 / $2.08 | 262K | Reliable Multi-Step Planning |
| Qwen3.5 397B A17B | $0.39 / $2.34 | 262K | Uncompromising Autonomous Logic |
| Qwen3 Coder Plus | $0.65 / $3.25 | 1M | Logic-First Automation |
Start with Qwen3.5-Flash unless you have a specific reason to pick another. It is the most practical choice for a persistent agent. At $0.07/M input and $0.26/M output, it provides a 1M token context window and vision capabilities, making it capable of processing massive message histories and image uploads in chat apps without breaking the bank.
Qwen3 235B A22B — High-Volume Background Processing
This is the cheapest way to get high-parameter reasoning at $0.07/M input. Use it for background tasks that don’t require vision, like summarizing long Slack threads or managing persistent memory updates where the 8K output limit isn’t a bottleneck.
Qwen3.5-Flash — The Context Champion
Flash is the only model in this price bracket offering a 1M context window and vision. For Hermes users running active group chats on WhatsApp or Discord, this model handles months of conversation history to maintain perfect agent persona consistency.
Qwen3.5-35B-A3B — Efficient Tool Orchestrator
This MoE model offers a significant jump in reasoning quality over Flash for $0.16/M input. It is better at deciding which of Hermes’ 47 tools to use when faced with ambiguous user requests, though it lacks the massive context of the Flash variant.
Qwen3.5-27B — Dense Model Stability
While similar in size to the 35B MoE, this dense model is often more stable for strict JSON formatting. Pick this if you find the MoE models hallucinating tool arguments in complex workflows, despite the slightly higher $0.2/M input cost.
Qwen3 Coder — Logic-First Automation
Even if you aren’t coding, the Coder series has the best instruction-following logic in the family. With a 262K output cap, it is the best choice for generating long-form reports or detailed autonomous plans that smaller models truncate.
Qwen3.5-122B-A10B — Reliable Multi-Step Planning
This model is the sweet spot for complex agent loops. At $0.26/M input, it handles multi-step tool dependencies—like fetching data from an MCP server and then posting it to a specific Slack channel—with much higher success rates than the 35B models.
Qwen3.5 397B A17B — Uncompromising Autonomous Logic
This is the flagship for high-stakes autonomy. If your agent is managing critical workflows via SSH or Modal, the $0.39/M input cost is justified by its superior ability to recover from tool errors and re-plan its approach without user intervention.
Qwen3 Coder Plus — Logic-First Automation
Even if you aren’t coding, the Coder series has the best instruction-following logic in the family. With a 262K output cap, it is the best choice for generating long-form reports or detailed autonomous plans that smaller models truncate.
Setup in Hermes Agent
To use Qwen with Hermes, run ‘hermes model’ and select ‘Custom endpoint’. You will need to provide your provider’s base URL (e.g., OpenRouter or a local vLLM instance) and the specific model identifier. Ensure your endpoint supports the /v1/chat/completions standard.
Running through haimaker.ai
Rather than standing up a per-provider account, you can point Hermes at haimaker.ai and get access to Qwen alongside every other frontier model through one API key:
- Base URL:
https://api.haimaker.ai/v1 - Model:
qwen/qwen3-235b-a22b
Direct provider setup
Hermes makes custom endpoints easy. Run:
hermes model
Choose Custom endpoint from the menu. Enter the base URL and model identifier when prompted:
- Base URL:
https://api.haimaker.ai/v1 - Model:
qwen/qwen3-235b-a22b
Hermes stores the selection and uses it for all subsequent agent runs. You can also set HERMES_STREAM_READ_TIMEOUT and related env vars if you’re hitting slow providers.
Bottom line
For the majority of Hermes users, Qwen3.5-Flash is the correct choice due to its 1M context and low cost. If your agent’s workflows involve complex multi-tool chains, upgrade to the 122B or 397B models for better planning reliability.
RUN QWEN IN HERMES WITH HAIMAKER
See our Hermes local-LLM setup guide.