Current as of March 2026. GLM-4.7 Flash from Zhipu AI is cheap: $0.07/M input, $0.40/M output. That’s less than half the price of GPT-4o-mini on input, with a 200K context window and vision built in. The tradeoffs are what you’d expect from a flash tier model — lower reasoning quality and some TTFT variance from non-APAC regions.
Specs
| Provider | Zhipu AI |
| Input cost | $0.07 / M tokens |
| Output cost | $0.40 / M tokens |
| Context window | 200K tokens |
| Max output | 32K tokens |
| Parameters | N/A |
| Features | function_calling, vision, reasoning |
What it’s good at
Price
$0.07/M input is genuinely low. For tasks where you’re running thousands of agent cycles, the savings compound quickly.
200K Context at This Price
Competitors with 200K context charge considerably more. For document analysis or long conversation history, the value is real.
Multimodal
Vision and function calling in the same flash-tier model is useful. You can handle image inputs without routing to a separate, more expensive model.
Where it falls short
Regional Latency
Zhipu’s servers are in Asia. Users in the US or Europe will see higher TTFT than they would from a US-hosted model.
English Language Nuance
It occasionally misses subtle English phrasing cues. For purely technical tasks this rarely matters; for anything requiring careful tone or interpretation, it’s noticeable.
Best use cases with OpenClaw
- High-volume document summarization — 200K context and $0.07/M input make this the cheapest way to read long texts at scale.
- Background agent tasks — Repetitive structured work like data extraction or classification. Reliable enough for the price.
Not ideal for
- Low-latency UI interactions — The Asia-hosted endpoint adds latency for non-APAC users.
- Complex creative writing — It follows patterns rigidly. Don’t expect stylistic flexibility.
OpenClaw setup
Point your OpenClaw provider to api.haimaker.ai/v1 and use the model ID z-ai/glm-4.7-flash. You will need a valid Haimaker API key for authentication.
{
"models": {
"mode": "merge",
"providers": {
"z-ai": {
"baseUrl": "https://api.haimaker.ai/v1",
"apiKey": "YOUR-Z-AI-API-KEY",
"api": "openai-completions",
"models": [
{
"id": "glm-4.7-flash",
"name": "GLM-4.7 Flash",
"cost": {
"input": 0.07,
"output": 0.4
},
"contextWindow": 200000,
"maxTokens": 32000
}
]
}
}
}
}
Run it through Haimaker
Skip juggling API keys. One Haimaker key gives you access to every model on the platform. Tell OpenClaw:
Add Haimaker as a custom provider to my OpenClaw config. Use these details:
- Provider name: haimaker
- Base URL: https://api.haimaker.ai/v1
- API key: [PASTE YOUR HAIMAKER API KEY HERE]
- API type: openai-completions
Add the auto-router model:
- haimaker/auto (reasoning: false, context: 128000, max tokens: 32000)
Create an alias "auto" for easy switching. Apply the config when done.
Or skip model selection entirely — Haimaker’s auto-router picks the best model for each task so you don’t have to.
How it compares
- vs GPT-4o-mini — 4o-mini costs $0.15/$0.60 per million and has better English reasoning. GLM-4.7 Flash is $0.07/$0.40 with a larger context window. For pure cost efficiency, Flash wins.
- vs Gemini 1.5 Flash — Gemini has a 1M context window, which Flash can’t match. For tasks under 200K tokens, GLM is often cheaper per token.
Bottom line
The cheapest large-context option on the market right now. Use it for high-volume background tasks where reasoning depth isn’t critical and APAC latency isn’t a problem.
For setup instructions, see our API key guide. For all available models, see the complete models guide.