What are the exact token costs for this model?

It costs $0.26 per million input tokens and $2.08 per million output tokens.

How large is the context window for Hermes to use?

The model supports a 262K token context window and can output up to 66K tokens in a single response.

Does it support vision for platform attachments?

Yes, it has native vision features, allowing Hermes to interpret images sent through messaging platforms.

Qwen3.5-122B-A10B for Hermes Agent: Pricing, Setup, and What It's Good At

Current as of April 2026. Qwen3.5-122B-A10B is a heavy-duty reasoning model that excels in Hermes Agent environments requiring deep cross-platform context and complex tool orchestration. With its massive 262K context window, it handles months of persistent memory without the common ‘forgetting’ issues seen in smaller models.

Specs


Provider	Qwen (Alibaba)
Input cost	$0.26 / M tokens
Output cost	$2.08 / M tokens
Context window	262K tokens
Max output	66K tokens
Parameters	N/A
Features	function_calling, vision, reasoning

What it’s good at

Massive 262K Context Window

Hermes can maintain a sprawling cross-session memory, allowing it to reference Slack conversations from weeks ago while executing current tasks on Discord.

Native Reasoning Architecture

The internal reasoning logic significantly reduces hallucinations when Hermes is navigating complex MCP protocol calls or multi-step tool chains.

Where it falls short

High Output Premium

$2.08 per million output tokens is steep compared to competitors like Llama 3.1 70B, which offers similar tool-use reliability for less.

Proprietary Constraints

Unlike the open-weight Qwen variants, this model is proprietary, meaning you are locked into Alibaba’s specific API performance and safety filters.

Best use cases with Hermes Agent

Cross-Platform Memory Synthesis — It can ingest 262K tokens of historical data from Telegram and Slack to build a consistent persona and knowledge base for the agent.
Complex MCP Orchestration — The reasoning capability ensures that multi-step tool interactions—like fetching a file via SSH and then posting a summary to WhatsApp—don’t break.

Not ideal for

High-Volume Notification Bots — The $0.26 input and $2.08 output costs make it overkill for simple ‘if-this-then-that’ message relaying.
Low-Latency Response Needs — Reasoning models often have a higher ‘time to first token’ compared to smaller, faster models like Llama 3.1 8B.

Hermes Agent setup

Configure your context window to the full 262,144 tokens in your environment variables to ensure Hermes doesn’t prune its memory prematurely. Enable the vision feature if you plan on having Hermes process screenshots from Discord or Telegram channels.

Hermes makes custom endpoints easy. Run:

hermes model

Choose Custom endpoint from the menu. Enter the base URL and model identifier when prompted:

Base URL: https://api.haimaker.ai/v1
Model: qwen/qwen3.5-122b-a10b

Hermes stores the selection and uses it for all subsequent agent runs across whatever platforms you have wired up (Telegram, Discord, Slack, etc.). Tune HERMES_STREAM_READ_TIMEOUT and related env vars if you’re hitting slow providers.

How it compares

vs Llama 3.1 70B — Llama is significantly cheaper for input/output but lacks the 262K context depth and specialized reasoning logic found in this Qwen model.
vs DeepSeek-V3 — DeepSeek offers a better price-to-performance ratio for general tool use, but Qwen3.5-122B handles complex CJK-language tool parameters more reliably.

Bottom line

This is the ‘brainy’ choice for Hermes users who need an agent that can reason through 200K+ tokens of history, though you’ll pay a premium for that stability.

TRY QWEN3.5-122B-A10B IN HERMES

For more, see our Hermes local-LLM setup guide.