Current as of April 2026. Qwen3 235B A22B is a heavy-hitter for Hermes Agent, offering a massive 262K context window and aggressive pricing at $0.07/$0.1 per million tokens. It is built for developers who need deep reasoning and long-term memory persistence across 15+ messaging platforms.

Specs

ProviderQwen (Alibaba)
Input cost$0.07 / M tokens
Output cost$0.10 / M tokens
Context window262K tokens
Max output8K tokens
ParametersN/A
Featuresfunction_calling, reasoning

What it’s good at

Tool-Use Reliability

It handles the 47 built-in Hermes tools with high precision, rarely failing JSON schema validation during complex autonomous loops.

Persistent Memory Capacity

The 262K context window allows the agent to maintain a coherent identity and memory across weeks of Slack and Discord interactions.

Multilingual Reasoning

Superior performance in CJK languages makes it the strongest candidate for Hermes deployments in international or multilingual environments.

Where it falls short

Output Bottlenecks

The 8K output limit can truncate complex summaries when the agent is synthesizing data from multiple MCP sources.

Inference Latency

Response times are slower than smaller models, which can lead to noticeable delays in fast-paced Telegram or WhatsApp threads.

Best use cases with Hermes Agent

  • Cross-Platform Monitoring — It effectively monitors Slack channels to trigger shell commands and report results back to Discord while maintaining context.
  • Complex MCP Integration — The reasoning capabilities ensure the model correctly maps local data from MCP servers to autonomous agent actions.

Not ideal for

  • Instant Chatbots — The latency is too high for simple conversational bots that don’t require the model’s heavy reasoning features.
  • Low-Budget Tasks — While cheap for its size, smaller models are more cost-effective for tasks that don’t leverage the 262K context window.

Hermes Agent setup

Enable the reasoning feature in your Hermes configuration to allow the model to utilize its internal chain-of-thought before executing tool calls.

Hermes makes custom endpoints easy. Run:

hermes model

Choose Custom endpoint from the menu. Enter the base URL and model identifier when prompted:

  • Base URL: https://api.haimaker.ai/v1
  • Model: qwen/qwen3-235b-a22b

Hermes stores the selection and uses it for all subsequent agent runs across whatever platforms you have wired up (Telegram, Discord, Slack, etc.). Tune HERMES_STREAM_READ_TIMEOUT and related env vars if you’re hitting slow providers.

How it compares

  • vs Llama 3.1 405B — Llama is more expensive and has a smaller context window, making Qwen3 better for persistent memory-heavy agents.
  • vs DeepSeek-V3 — DeepSeek is competitive on price, but Qwen3’s 262K context window provides a significant advantage for long-running autonomous sessions.

Bottom line

For Hermes Agent users who need massive context and reliable tool execution across platforms without the cost of proprietary Western models, Qwen3 235B is the top choice.

TRY QWEN3 235B A22B IN HERMES


For more, see our Hermes local-LLM setup guide.