Current as of April 2026. Qwen models have become the backbone for autonomous agents like Hermes because they punch above their weight in tool-calling reliability and multilingual support. While the Coder variants are famous for logic, the 3.5 series brings vision and massive context windows that are essential for tracking long-running conversations across Discord or Telegram.

The quick answer

ModelInput / OutputContextBest For
Qwen3 235B A22B$0.07 / $0.10262KHigh-Volume Background Processing
Qwen3.5-Flash$0.07 / $0.261MThe Context Champion
Qwen3.5-35B-A3B$0.16 / $1.30262KEfficient Tool Orchestrator
Qwen3.5-27B$0.20 / $1.56262KDense Model Stability
Qwen3 Coder$0.22 / $1.00262KLogic-First Automation
Qwen3.5-122B-A10B$0.26 / $2.08262KReliable Multi-Step Planning
Qwen3.5 397B A17B$0.39 / $2.34262KUncompromising Autonomous Logic
Qwen3 Coder Plus$0.65 / $3.251MLogic-First Automation

Start with Qwen3.5-Flash unless you have a specific reason to pick another. It is the most practical choice for a persistent agent. At $0.07/M input and $0.26/M output, it provides a 1M token context window and vision capabilities, making it capable of processing massive message histories and image uploads in chat apps without breaking the bank.

Qwen3 235B A22B — High-Volume Background Processing

This is the cheapest way to get high-parameter reasoning at $0.07/M input. Use it for background tasks that don’t require vision, like summarizing long Slack threads or managing persistent memory updates where the 8K output limit isn’t a bottleneck.

Qwen3.5-Flash — The Context Champion

Flash is the only model in this price bracket offering a 1M context window and vision. For Hermes users running active group chats on WhatsApp or Discord, this model handles months of conversation history to maintain perfect agent persona consistency.

Qwen3.5-35B-A3B — Efficient Tool Orchestrator

This MoE model offers a significant jump in reasoning quality over Flash for $0.16/M input. It is better at deciding which of Hermes’ 47 tools to use when faced with ambiguous user requests, though it lacks the massive context of the Flash variant.

Qwen3.5-27B — Dense Model Stability

While similar in size to the 35B MoE, this dense model is often more stable for strict JSON formatting. Pick this if you find the MoE models hallucinating tool arguments in complex workflows, despite the slightly higher $0.2/M input cost.

Qwen3 Coder — Logic-First Automation

Even if you aren’t coding, the Coder series has the best instruction-following logic in the family. With a 262K output cap, it is the best choice for generating long-form reports or detailed autonomous plans that smaller models truncate.

Qwen3.5-122B-A10B — Reliable Multi-Step Planning

This model is the sweet spot for complex agent loops. At $0.26/M input, it handles multi-step tool dependencies—like fetching data from an MCP server and then posting it to a specific Slack channel—with much higher success rates than the 35B models.

Qwen3.5 397B A17B — Uncompromising Autonomous Logic

This is the flagship for high-stakes autonomy. If your agent is managing critical workflows via SSH or Modal, the $0.39/M input cost is justified by its superior ability to recover from tool errors and re-plan its approach without user intervention.

Qwen3 Coder Plus — Logic-First Automation

Even if you aren’t coding, the Coder series has the best instruction-following logic in the family. With a 262K output cap, it is the best choice for generating long-form reports or detailed autonomous plans that smaller models truncate.

Setup in Hermes Agent

To use Qwen with Hermes, run ‘hermes model’ and select ‘Custom endpoint’. You will need to provide your provider’s base URL (e.g., OpenRouter or a local vLLM instance) and the specific model identifier. Ensure your endpoint supports the /v1/chat/completions standard.

Running through haimaker.ai

Rather than standing up a per-provider account, you can point Hermes at haimaker.ai and get access to Qwen alongside every other frontier model through one API key:

  • Base URL: https://api.haimaker.ai/v1
  • Model: qwen/qwen3-235b-a22b

Direct provider setup

Hermes makes custom endpoints easy. Run:

hermes model

Choose Custom endpoint from the menu. Enter the base URL and model identifier when prompted:

  • Base URL: https://api.haimaker.ai/v1
  • Model: qwen/qwen3-235b-a22b

Hermes stores the selection and uses it for all subsequent agent runs. You can also set HERMES_STREAM_READ_TIMEOUT and related env vars if you’re hitting slow providers.

Bottom line

For the majority of Hermes users, Qwen3.5-Flash is the correct choice due to its 1M context and low cost. If your agent’s workflows involve complex multi-tool chains, upgrade to the 122B or 397B models for better planning reliability.

RUN QWEN IN HERMES WITH HAIMAKER


See our Hermes local-LLM setup guide.