Zhipu AI ships GLM models at prices that make the Western labs look expensive. As of April 2026, there are four GLM models you’d consider for OpenClaw — and the decision between them is mostly about how aggressive you want to be on cost.
Short version: GLM-4.7 for daily work, GLM-4.7 Flash when you’re processing volume, GLM-5 for reasoning-heavy tasks. GLM-4.6 is still available but 4.7 is a straight upgrade at the same price.
The quick answer
| Model | Input/Output Cost | Context | Best For |
|---|---|---|---|
| GLM-4.7 | $0.39 / $1.75 | 200K | Default for daily coding |
| GLM-4.7 Flash | $0.06 / $0.40 | 200K | Volume work, classification |
| GLM-5 | $0.72 / $2.30 | 80K | Reasoning-heavy tasks |
| GLM-4.6 | $0.39 / $1.90 | 200K | Legacy — use 4.7 |
Start with GLM-4.7 unless you have a specific cost or capability reason to pick something else.
GLM-4.7 — the default pick
$0.39/M input, $1.75/M output, 200K context, 64K output cap. 4.7 is the sweet spot in the GLM lineup: cheap enough to run all day, capable enough to handle real coding work.
Zhipu tuned 4.7 specifically for agentic use. Tool calling is more reliable than 4.6 — function arguments come back well-formed and the model doesn’t invent fields that don’t exist in the schema. For OpenClaw, that’s the difference between agents that finish tasks and agents that loop on malformed tool calls.
The 200K context is enough for most single-file or small-project work. The 64K output cap means you can regenerate a file in one pass without truncation.
Where it falls short: hard reasoning. If you’re debugging a race condition or designing a non-trivial algorithm, 4.7 will get stuck. That’s what GLM-5 is for. For everything else — refactoring, writing tests, generating boilerplate, triaging bugs — 4.7 is hard to beat at this price.
GLM-4.7 Flash — the volume model
$0.06/M input, $0.40/M output, 200K context, 32K output cap. Flash is a different animal. At six cents per million input tokens, it’s within rounding distance of free.
Use Flash when the task is simple enough that you’re not really choosing a model, you’re just running one:
- File classification (is this a test file, config, or source?)
- Commit message generation
- Simple search-and-replace across a codebase
- First-pass triage before handing work to 4.7 or GLM-5
Flash isn’t smart. Don’t ask it to reason. But for mechanical work at volume, you can process millions of tokens for pocket change.
The 32K output cap is tighter than the rest of the GLM lineup — Flash can’t write long files in one pass. That’s an intentional trade-off; if you need long outputs, use 4.7.
GLM-5 — the reasoning model
$0.72/M input, $2.30/M output, 80K context, 128K output cap. GLM-5 is Zhipu’s newest, and it’s positioned differently than the 4.x line.
Two things to know:
- The context window shrank. 80K vs 200K on 4.7. GLM-5 can’t hold as much code in working memory. For context-heavy work, stay on 4.7.
- The output cap expanded. 128K output vs 64K on 4.7. GLM-5 can generate very long outputs in one pass, which matters for verbose reasoning or large refactors.
GLM-5 is better than 4.7 on hard reasoning. Multi-step debugging where you need the model to trace hypotheses. Architecture decisions with competing trade-offs. Refactors that require understanding implicit invariants.
At $0.72/M input, GLM-5 is still roughly 4x cheaper than Claude Sonnet 4.6 for comparable reasoning quality on most tasks. Worth trying before you reach for a Western flagship.
GLM-4.6 — skip it
Same $0.39/M input price as 4.7, slightly higher output cost ($1.90 vs $1.75), and slightly worse tool calling. There’s no scenario where 4.6 is the right choice over 4.7. If you have 4.6 in an existing config, swap to 4.7 and move on.
Setup in OpenClaw
GLM isn’t a built-in provider. Two routes to wire it in.
Running through haimaker.ai
All GLM models are available through haimaker.ai with a single API key. If you’re already using haimaker for other providers, you don’t need a separate Zhipu account:
{
"models": {
"providers": {
"haimaker": {
"baseUrl": "https://api.haimaker.ai/v1",
"apiKey": "your-haimaker-api-key",
"api": "openai-completions"
}
}
}
}
Then add the models you want to the allowlist:
{
"agents": {
"defaults": {
"models": {
"z-ai/glm-4.7": {},
"z-ai/glm-4.7-flash": {},
"z-ai/glm-5": {}
}
}
}
}
Apply with openclaw gateway config.apply and switch models with /model during a session.
Direct Zhipu setup
If you’d rather hit Zhipu’s API directly, sign up at bigmodel.cn and configure:
{
"models": {
"providers": {
"zhipu": {
"baseUrl": "https://open.bigmodel.cn/api/paas/v4",
"apiKey": "your-zhipu-api-key",
"api": "openai-completions"
}
}
}
}
The reliability picture
Zhipu’s API is noticeably more reliable than DeepSeek’s. Fewer 503 errors, more consistent time-to-first-token. It’s still a Chinese provider serving globally, so expect some latency variance depending on your region — but it’s not the same reliability story as DeepSeek.
If you’re running production agents that need to not fail silently, GLM through haimaker.ai gives you automatic failover to alternate providers when Zhipu has an issue. See our auto-router guide for setup.
What I’d do
Run GLM-4.7 as your default GLM model. Add Flash to your allowlist for bulk classification and triage tasks. Keep GLM-5 configured for the narrow set of problems 4.7 struggles with — hard reasoning, multi-step debugging.
GLM is one of the few model families where the cheap tier is actually useful. Flash at $0.06/M input is cheap enough that you can run experimental agents all day without paying attention to token bills. That’s a different regime than what GPT or Claude let you do, and it changes what kinds of automation become practical.
Pair GLM with a Western flagship (Sonnet 4.6, GPT-5.4) as a fallback for when you need the absolute ceiling on quality. GLM handles 80% of OpenClaw work well; keep the other 20% on a model that’s been tuned for the hard edge cases.
For setup instructions, see our OpenClaw API key guide. For all available models, see the complete models guide.