What is the cost per million tokens?

Input costs $0.39 per million tokens and output costs $1.75 per million tokens.

How much context can it handle?

It supports a 203K token context window, which is ample for Hermes' cross-session memory.

Does it support vision for Hermes?

Yes, it includes vision features, allowing Hermes to analyze images or screenshots shared across messaging platforms.

GLM-4.7 for Hermes Agent: Pricing, Setup, and What It's Good At

Current as of April 2026. GLM-4.7 from Zhipu AI is a budget-focused powerhouse for Hermes Agent deployments that need high throughput without the OpenAI price tag. It handles the 203K context window surprisingly well for long-running autonomous sessions across Slack and Discord.

Specs


Provider	Zhipu AI
Input cost	$0.39 / M tokens
Output cost	$1.75 / M tokens
Context window	203K tokens
Max output	64K tokens
Parameters	N/A
Features	function_calling, vision, reasoning

What it’s good at

Aggressive Pricing

At $0.39 per million input tokens, it is significantly cheaper than GPT-4o while maintaining solid tool-use reliability for Hermes’ 47 built-in functions.

Massive Context Window

The 203K context window allows Hermes to maintain deep persistent memory across weeks of cross-platform messaging history without losing the thread.

Where it falls short

Latency Variability

Since Zhipu’s infrastructure is based in China, users outside the region may experience higher latency spikes compared to US-based providers.

MCP Protocol Nuances

It sometimes struggles with complex nested MCP tool calls compared to Claude 3.5 Sonnet, occasionally requiring more explicit system prompting in Hermes.

Best use cases with Hermes Agent

High-Volume Messaging Automation — The low cost offsets the high token usage of continuous polling and history retrieval across 15+ messaging platforms.
Long-Term Memory Tasks — The 203K context window and closed learning loop in Hermes benefit from its ability to ingest massive amounts of previous session data.

Not ideal for

Real-time Low Latency Critical Apps — Network hops to Zhipu’s servers can introduce a 1-2 second delay that might annoy users on snappy platforms like Telegram.
High-Stakes Financial Tooling — Its function calling can occasionally hallucinate parameters when juggling more than 10 tools at once in a single prompt.

Hermes Agent setup

Use the OpenAI-compatible endpoint provided by Zhipu BigModel API. Ensure your API key is correctly mapped and the model ID is set to z-ai/glm-4.7 in your Hermes configuration file.

Hermes makes custom endpoints easy. Run:

hermes model

Choose Custom endpoint from the menu. Enter the base URL and model identifier when prompted:

Base URL: https://api.haimaker.ai/v1
Model: z-ai/glm-4.7

Hermes stores the selection and uses it for all subsequent agent runs across whatever platforms you have wired up (Telegram, Discord, Slack, etc.). Tune HERMES_STREAM_READ_TIMEOUT and related env vars if you’re hitting slow providers.

How it compares

vs GPT-4o-mini — GLM-4.7 offers a much larger 203K context window compared to GPT-4o-mini’s 128K, making it better for Hermes’ persistent memory features.
vs DeepSeek-V3 — DeepSeek is often cheaper for raw throughput, but GLM-4.7’s 64K output limit gives it an edge for complex multi-platform reporting.

Bottom line

GLM-4.7 is the go-to choice for Hermes users who need a massive 203K context window and reliable tool-use at a fraction of the cost of Western flagship models.

TRY GLM-4.7 IN HERMES

For more, see our Hermes local-LLM setup guide.