Current as of April 2026. Zhipu AI’s GLM family is the pragmatic choice for Hermes Agent users who need deep context windows and reliable tool-calling without the premium pricing of US-based labs. These models excel at managing persistent memory across 15+ messaging platforms and handle the 47 built-in Hermes tools with high precision.
The quick answer
| Model | Input / Output | Context | Best For |
|---|---|---|---|
| GLM-4.7 Flash | $0.06 / $0.40 | 203K | High-Volume Message Polling |
| GLM-4.6 | $0.39 / $1.90 | 205K | Deep Document Generation |
| GLM-4.7 | $0.39 / $1.75 | 203K | High-Volume Message Polling |
| GLM-5 | $0.72 / $2.30 | 80K | Advanced Reasoning Specialist |
Start with GLM-4.7 unless you have a specific reason to pick another. It offers the best balance of features for an autonomous agent. At $0.39 per million input tokens, you get 203K context for long-term memory and native vision support for multi-modal tool interactions, making it more versatile than the 4.6 or Flash variants.
GLM-4.7 Flash — High-Volume Message Polling
At $0.06 per million input tokens, this is the only logical choice for agents monitoring high-traffic Discord or Telegram channels. It retains the 203K context window of its larger siblings, allowing it to ingest massive conversation histories before making a tool-calling decision, though its reasoning is less robust for complex multi-step workflows.
GLM-4.6 — Deep Document Generation
This model is nearly identical to GLM-4.7 in pricing but trades vision support for a massive 131K max output token limit. Choose this if your Hermes Agent is tasked with generating long-form reports or extensive logs from cross-session data where the 64K limit of the 4.7 version would cut off your workflow.
GLM-4.7 — High-Volume Message Polling
At $0.06 per million input tokens, this is the only logical choice for agents monitoring high-traffic Discord or Telegram channels. It retains the 203K context window of its larger siblings, allowing it to ingest massive conversation histories before making a tool-calling decision, though its reasoning is less robust for complex multi-step workflows.
GLM-5 — Advanced Reasoning Specialist
GLM-5 is the premium option at $0.72 per million input tokens. It handles intricate logic better than the 4.7 series, which is useful for autonomous agents managing complex scheduling or data synthesis. However, the context window drops significantly to 80K, making it less suitable for agents that rely on months of persistent chat history.
Setup in Hermes Agent
To integrate these with Hermes Agent, run the ‘hermes model’ command and select ‘Custom endpoint’. You must provide the Zhipu AI base URL and your specific model identifier. Ensure your API key is correctly configured in your environment variables to allow the /v1/chat/completions endpoint to authenticate successfully.
Running through haimaker.ai
Rather than standing up a per-provider account, you can point Hermes at haimaker.ai and get access to GLM alongside every other frontier model through one API key:
- Base URL:
https://api.haimaker.ai/v1 - Model:
z-ai/glm-4.7-flash
Direct provider setup
Hermes makes custom endpoints easy. Run:
hermes model
Choose Custom endpoint from the menu. Enter the base URL and model identifier when prompted:
- Base URL:
https://api.haimaker.ai/v1 - Model:
z-ai/glm-4.7-flash
Hermes stores the selection and uses it for all subsequent agent runs. You can also set HERMES_STREAM_READ_TIMEOUT and related env vars if you’re hitting slow providers.
Bottom line
For a persistent Hermes Agent, GLM-4.7 provides the most utility per dollar, combining vision, a large context window, and reliable tool-calling for $0.39 per million tokens.
RUN GLM IN HERMES WITH HAIMAKER
See our Hermes local-LLM setup guide.