What is the exact cost of using GLM-5?

Input costs $0.80 per million tokens and output costs $2.56 per million tokens.

How much context can it actually handle?

It supports a 203K token context window and can generate up to 128K tokens in a single response.

GLM-5 for OpenClaw: Pricing, Setup, and What It's Good At

Current as of March 2026. GLM-5 is Zhipu’s top-tier model: $0.80/$2.56 per million tokens, 203K context, and 128K output. The jump from GLM-4.7’s 64K output ceiling to GLM-5’s 128K is the main reason to pay the higher price. If you need to generate very large artifacts from a large context, GLM-5 is where that combination becomes available in this family.

Specs


Provider	Zhipu AI
Input cost	$0.80 / M tokens
Output cost	$2.56 / M tokens
Context window	203K tokens
Max output	128K tokens
Parameters	N/A
Features	function_calling, reasoning

What it’s good at

128K Output Capacity

This is the main differentiator over the rest of the GLM family. Generating full codebases, comprehensive technical documentation, or large transformation tasks that need long continuous output.

Reasoning

Compared to the flash and 4.6/4.7 models, GLM-5’s reasoning is more reliable on multi-step logic. It’s not GPT-4o level, but it stays on task better through complex tool chains.

203K Context + 128K Output Combination

Big in, big out. That’s a useful combination for transformation pipelines where you don’t want to chunk input or truncate output.

Where it falls short

Reasoning Latency

Built-in reasoning adds time before the first token. For background tasks this is fine; for anything interactive it’s noticeable.

Cultural Bias

On ambiguous prompts, it occasionally defaults to Chinese cultural contexts or idioms. Worth testing on your specific content types.

Best use cases with OpenClaw

Long-form Content Generation — 128K out means you won’t hit a truncation wall mid-document.
Complex Agentic Workflows — Reasoning and function calling together handle OpenClaw’s tool loops more reliably than the cheaper models in this family.

Not ideal for

Real-time Chatbots — The reasoning overhead makes it too slow for responsive user interactions.
Simple Classification — You’re paying for reasoning capability you don’t need. GLM-4.7 Flash does basic labeling for a fraction of the price.

Run it through Haimaker

Skip juggling API keys. One Haimaker key gives you access to every model on the platform. Tell OpenClaw:

Add Haimaker as a custom provider to my OpenClaw config. Use these details:

- Provider name: haimaker
- Base URL: https://api.haimaker.ai/v1
- API key: [PASTE YOUR HAIMAKER API KEY HERE]
- API type: openai-completions

Add the auto-router model:
- haimaker/auto (reasoning: false, context: 128000, max tokens: 32000)

Create an alias "auto" for easy switching. Apply the config when done.

Or skip model selection entirely — Haimaker’s auto-router picks the best model for each task so you don’t have to.

OpenClaw setup

Configure your provider base URL to api.haimaker.ai/v1 and use the model ID z-ai/glm-5. Ensure your timeout settings are high enough to accommodate the reasoning phase before output begins.

{
  "models": {
    "mode": "merge",
    "providers": {
      "z-ai": {
        "baseUrl": "https://api.haimaker.ai/v1",
        "apiKey": "YOUR-Z-AI-API-KEY",
        "api": "openai-completions",
        "models": [
          {
            "id": "glm-5",
            "name": "GLM-5",
            "cost": {
              "input": 0.7999999999999999,
              "output": 2.56
            },
            "contextWindow": 202752,
            "maxTokens": 128000
          }
        ]
      }
    }
  }
}

How it compares

vs GPT-4o-mini — 4o-mini is cheaper for high-volume simple tasks but caps at 128K context and 16K output. GLM-5 wins on both.
vs Claude 3 Haiku — Haiku is faster for short responses. GLM-5’s 128K output ceiling is in a different class for document generation tasks.

Bottom line

Use GLM-5 when you need the full combination of large context, large output, and reasoning — and you can’t justify the price of Western frontier models. For simpler tasks, GLM-4.7 Flash at $0.07/M is the better call.

TRY GLM-5 ON HAIMAKER

For setup instructions, see our API key guide. For all available models, see the complete models guide.