Current as of March 2026. GLM-5 is Zhipu’s top-tier model: $0.80/$2.56 per million tokens, 203K context, and 128K output. The jump from GLM-4.7’s 64K output ceiling to GLM-5’s 128K is the main reason to pay the higher price. If you need to generate very large artifacts from a large context, GLM-5 is where that combination becomes available in this family.
Specs
| Provider | Zhipu AI |
| Input cost | $0.80 / M tokens |
| Output cost | $2.56 / M tokens |
| Context window | 203K tokens |
| Max output | 128K tokens |
| Parameters | N/A |
| Features | function_calling, reasoning |
What it’s good at
128K Output Capacity
This is the main differentiator over the rest of the GLM family. Generating full codebases, comprehensive technical documentation, or large transformation tasks that need long continuous output.
Reasoning
Compared to the flash and 4.6/4.7 models, GLM-5’s reasoning is more reliable on multi-step logic. It’s not GPT-4o level, but it stays on task better through complex tool chains.
203K Context + 128K Output Combination
Big in, big out. That’s a useful combination for transformation pipelines where you don’t want to chunk input or truncate output.
Where it falls short
Reasoning Latency
Built-in reasoning adds time before the first token. For background tasks this is fine; for anything interactive it’s noticeable.
Cultural Bias
On ambiguous prompts, it occasionally defaults to Chinese cultural contexts or idioms. Worth testing on your specific content types.
Best use cases with OpenClaw
- Long-form Content Generation — 128K out means you won’t hit a truncation wall mid-document.
- Complex Agentic Workflows — Reasoning and function calling together handle OpenClaw’s tool loops more reliably than the cheaper models in this family.
Not ideal for
- Real-time Chatbots — The reasoning overhead makes it too slow for responsive user interactions.
- Simple Classification — You’re paying for reasoning capability you don’t need. GLM-4.7 Flash does basic labeling for a fraction of the price.
Run it through Haimaker
Skip juggling API keys. One Haimaker key gives you access to every model on the platform. Tell OpenClaw:
Add Haimaker as a custom provider to my OpenClaw config. Use these details:
- Provider name: haimaker
- Base URL: https://api.haimaker.ai/v1
- API key: [PASTE YOUR HAIMAKER API KEY HERE]
- API type: openai-completions
Add the auto-router model:
- haimaker/auto (reasoning: false, context: 128000, max tokens: 32000)
Create an alias "auto" for easy switching. Apply the config when done.
Or skip model selection entirely — Haimaker’s auto-router picks the best model for each task so you don’t have to.
OpenClaw setup
Configure your provider base URL to api.haimaker.ai/v1 and use the model ID z-ai/glm-5. Ensure your timeout settings are high enough to accommodate the reasoning phase before output begins.
{
"models": {
"mode": "merge",
"providers": {
"z-ai": {
"baseUrl": "https://api.haimaker.ai/v1",
"apiKey": "YOUR-Z-AI-API-KEY",
"api": "openai-completions",
"models": [
{
"id": "glm-5",
"name": "GLM-5",
"cost": {
"input": 0.7999999999999999,
"output": 2.56
},
"contextWindow": 202752,
"maxTokens": 128000
}
]
}
}
}
}
How it compares
- vs GPT-4o-mini — 4o-mini is cheaper for high-volume simple tasks but caps at 128K context and 16K output. GLM-5 wins on both.
- vs Claude 3 Haiku — Haiku is faster for short responses. GLM-5’s 128K output ceiling is in a different class for document generation tasks.
Bottom line
Use GLM-5 when you need the full combination of large context, large output, and reasoning — and you can’t justify the price of Western frontier models. For simpler tasks, GLM-4.7 Flash at $0.07/M is the better call.
For setup instructions, see our API key guide. For all available models, see the complete models guide.