Current as of March 2026. GLM-4.7 Flash from Zhipu AI is cheap: $0.07/M input, $0.40/M output. That’s less than half the price of GPT-4o-mini on input, with a 200K context window and vision built in. The tradeoffs are what you’d expect from a flash tier model — lower reasoning quality and some TTFT variance from non-APAC regions.

Specs

ProviderZhipu AI
Input cost$0.07 / M tokens
Output cost$0.40 / M tokens
Context window200K tokens
Max output32K tokens
ParametersN/A
Featuresfunction_calling, vision, reasoning

What it’s good at

Price

$0.07/M input is genuinely low. For tasks where you’re running thousands of agent cycles, the savings compound quickly.

200K Context at This Price

Competitors with 200K context charge considerably more. For document analysis or long conversation history, the value is real.

Multimodal

Vision and function calling in the same flash-tier model is useful. You can handle image inputs without routing to a separate, more expensive model.

Where it falls short

Regional Latency

Zhipu’s servers are in Asia. Users in the US or Europe will see higher TTFT than they would from a US-hosted model.

English Language Nuance

It occasionally misses subtle English phrasing cues. For purely technical tasks this rarely matters; for anything requiring careful tone or interpretation, it’s noticeable.

Best use cases with OpenClaw

  • High-volume document summarization — 200K context and $0.07/M input make this the cheapest way to read long texts at scale.
  • Background agent tasks — Repetitive structured work like data extraction or classification. Reliable enough for the price.

Not ideal for

  • Low-latency UI interactions — The Asia-hosted endpoint adds latency for non-APAC users.
  • Complex creative writing — It follows patterns rigidly. Don’t expect stylistic flexibility.

OpenClaw setup

Point your OpenClaw provider to api.haimaker.ai/v1 and use the model ID z-ai/glm-4.7-flash. You will need a valid Haimaker API key for authentication.

{
  "models": {
    "mode": "merge",
    "providers": {
      "z-ai": {
        "baseUrl": "https://api.haimaker.ai/v1",
        "apiKey": "YOUR-Z-AI-API-KEY",
        "api": "openai-completions",
        "models": [
          {
            "id": "glm-4.7-flash",
            "name": "GLM-4.7 Flash",
            "cost": {
              "input": 0.07,
              "output": 0.4
            },
            "contextWindow": 200000,
            "maxTokens": 32000
          }
        ]
      }
    }
  }
}

Run it through Haimaker

Skip juggling API keys. One Haimaker key gives you access to every model on the platform. Tell OpenClaw:

Add Haimaker as a custom provider to my OpenClaw config. Use these details:

- Provider name: haimaker
- Base URL: https://api.haimaker.ai/v1
- API key: [PASTE YOUR HAIMAKER API KEY HERE]
- API type: openai-completions

Add the auto-router model:
- haimaker/auto (reasoning: false, context: 128000, max tokens: 32000)

Create an alias "auto" for easy switching. Apply the config when done.

Or skip model selection entirely — Haimaker’s auto-router picks the best model for each task so you don’t have to.

How it compares

  • vs GPT-4o-mini — 4o-mini costs $0.15/$0.60 per million and has better English reasoning. GLM-4.7 Flash is $0.07/$0.40 with a larger context window. For pure cost efficiency, Flash wins.
  • vs Gemini 1.5 Flash — Gemini has a 1M context window, which Flash can’t match. For tasks under 200K tokens, GLM is often cheaper per token.

Bottom line

The cheapest large-context option on the market right now. Use it for high-volume background tasks where reasoning depth isn’t critical and APAC latency isn’t a problem.

TRY GLM-4.7 FLASH ON HAIMAKER


For setup instructions, see our API key guide. For all available models, see the complete models guide.