What is the token pricing for this model?

Input tokens cost $0.65 per million and output tokens cost $3.25 per million.

How large is the context window?

It supports up to 1,000,000 tokens, making it ideal for Hermes agents with long-term persistent memory.

Does it support function calling?

Yes, it has native support for function calling and reasoning, which is essential for Hermes' tool-use capabilities.

Qwen3 Coder Plus for Hermes Agent: Pricing, Setup, and What It's Good At

Current as of April 2026. Qwen3 Coder Plus is a sleeper hit for Hermes Agent users who need a massive context window without the Claude 3.5 price tag. While branded for coding, its reasoning capabilities make it a reliable driver for complex, multi-platform autonomous loops.

Specs


Provider	Qwen (Alibaba)
Input cost	$0.65 / M tokens
Output cost	$3.25 / M tokens
Context window	1M tokens
Max output	66K tokens
Parameters	N/A
Features	function_calling, reasoning

What it’s good at

Massive 1M Context Window

Hermes can maintain persistent memory across thousands of Slack and Discord interactions without needing aggressive RAG or truncation.

Reliable Tool Execution

It handles Hermes’ 47 built-in tools and external MCP protocols with high precision, rarely hallucinating JSON arguments in shell commands.

Cost-Effective Reasoning

At $0.65 per million input tokens, it provides high-tier reasoning for autonomous decision-making at a fraction of the cost of GPT-4o.

Where it falls short

Robotic Persona

The model tends to be overly formal and dry, requiring heavy system prompting to maintain a unique identity on messaging platforms.

Reasoning Latency

Deep reasoning chains can cause noticeable delays in real-time chat responses on Telegram or WhatsApp compared to smaller models.

Best use cases with Hermes Agent

Cross-Platform Infrastructure Management — Its ability to reason through shell commands and MCP tools makes it perfect for monitoring servers and posting status updates across Slack and Discord.
Long-Term Autonomous Research — The 1M token window allows the agent to ingest huge amounts of documentation and message history to make informed decisions over weeks of operation.

Not ideal for

High-Speed Customer Support — The output latency is too high for users who expect instant replies in a chat interface.
Low-Complexity Automation — Using a reasoning-heavy model for simple ‘if-this-then-that’ tasks is a waste of the $3.25 per million output token cost.

Hermes Agent setup

Configure the provider as Qwen and ensure the max_tokens is set high to take advantage of the 66K output limit. Use the OpenAI-compatible endpoint for the most stable tool-calling performance within Hermes.

Hermes makes custom endpoints easy. Run:

hermes model

Choose Custom endpoint from the menu. Enter the base URL and model identifier when prompted:

Base URL: https://api.haimaker.ai/v1
Model: qwen/qwen3-coder-plus

Hermes stores the selection and uses it for all subsequent agent runs across whatever platforms you have wired up (Telegram, Discord, Slack, etc.). Tune HERMES_STREAM_READ_TIMEOUT and related env vars if you’re hitting slow providers.

How it compares

vs Claude 3.5 Sonnet — Qwen3 Coder Plus is significantly cheaper for inputs ($0.65 vs $3.00) and offers a much larger 1M context window versus Claude’s 200K.
vs GPT-4o-mini — While GPT-4o-mini is cheaper, Qwen3 Coder Plus is far more capable at following complex MCP schemas and maintaining logic in long autonomous runs.

Bottom line

For Hermes users building complex, long-running agents that need to remember everything and rarely fail a tool call, Qwen3 Coder Plus is the best value-to-performance choice on the market.

TRY QWEN3 CODER PLUS IN HERMES

For more, see our Hermes local-LLM setup guide.