What is the pricing for Qwen3 235B?

Input tokens cost $0.07 per million and output tokens cost $0.1 per million, making it one of the most affordable high-parameter models.

How large is the context window?

The model supports up to 262,000 tokens, which is ideal for maintaining the closed learning loop in Hermes Agent.

Does it support function calling?

Yes, it has native support for function calling and reasoning, which is critical for the 47 tools provided by Hermes.

Qwen3 235B A22B for Hermes Agent: Pricing, Setup, and What It's Good At

Current as of April 2026. Qwen3 235B A22B is a heavy-hitter for Hermes Agent, offering a massive 262K context window and aggressive pricing at $0.07/$0.1 per million tokens. It is built for developers who need deep reasoning and long-term memory persistence across 15+ messaging platforms.

Specs


Provider	Qwen (Alibaba)
Input cost	$0.07 / M tokens
Output cost	$0.10 / M tokens
Context window	262K tokens
Max output	8K tokens
Parameters	N/A
Features	function_calling, reasoning

What it’s good at

Tool-Use Reliability

It handles the 47 built-in Hermes tools with high precision, rarely failing JSON schema validation during complex autonomous loops.

Persistent Memory Capacity

The 262K context window allows the agent to maintain a coherent identity and memory across weeks of Slack and Discord interactions.

Multilingual Reasoning

Superior performance in CJK languages makes it the strongest candidate for Hermes deployments in international or multilingual environments.

Where it falls short

Output Bottlenecks

The 8K output limit can truncate complex summaries when the agent is synthesizing data from multiple MCP sources.

Inference Latency

Response times are slower than smaller models, which can lead to noticeable delays in fast-paced Telegram or WhatsApp threads.

Best use cases with Hermes Agent

Cross-Platform Monitoring — It effectively monitors Slack channels to trigger shell commands and report results back to Discord while maintaining context.
Complex MCP Integration — The reasoning capabilities ensure the model correctly maps local data from MCP servers to autonomous agent actions.

Not ideal for

Instant Chatbots — The latency is too high for simple conversational bots that don’t require the model’s heavy reasoning features.
Low-Budget Tasks — While cheap for its size, smaller models are more cost-effective for tasks that don’t leverage the 262K context window.

Hermes Agent setup

Enable the reasoning feature in your Hermes configuration to allow the model to utilize its internal chain-of-thought before executing tool calls.

Hermes makes custom endpoints easy. Run:

hermes model

Choose Custom endpoint from the menu. Enter the base URL and model identifier when prompted:

Base URL: https://api.haimaker.ai/v1
Model: qwen/qwen3-235b-a22b

Hermes stores the selection and uses it for all subsequent agent runs across whatever platforms you have wired up (Telegram, Discord, Slack, etc.). Tune HERMES_STREAM_READ_TIMEOUT and related env vars if you’re hitting slow providers.

How it compares

vs Llama 3.1 405B — Llama is more expensive and has a smaller context window, making Qwen3 better for persistent memory-heavy agents.
vs DeepSeek-V3 — DeepSeek is competitive on price, but Qwen3’s 262K context window provides a significant advantage for long-running autonomous sessions.

Bottom line

For Hermes Agent users who need massive context and reliable tool execution across platforms without the cost of proprietary Western models, Qwen3 235B is the top choice.

TRY QWEN3 235B A22B IN HERMES

For more, see our Hermes local-LLM setup guide.