What are the token limits for GLM-4.6?

It features a 203,000 token context window and a massive 131,000 token maximum output limit.

How much does it cost to use?

Input costs $0.40 per million tokens, and output costs $1.75 per million tokens.

GLM-4.6 for OpenClaw: Pricing, Setup, and What It's Good At

Current as of March 2026. GLM-4.6 from Zhipu AI has an unusual property: a 131K token output limit at $1.75/M. Most models at this price range cap output at 4K or 8K. If your use case involves generating large artifacts — long code files, comprehensive documentation, detailed reports — that output ceiling is worth paying attention to.

Specs


Provider	Zhipu AI
Input cost	$0.40 / M tokens
Output cost	$1.75 / M tokens
Context window	203K tokens
Max output	131K tokens
Parameters	N/A
Features	function_calling, reasoning

What it’s good at

131K Output Ceiling

This is the main differentiator. At this price point, 131K out is unusual. You can generate entire chapters, large code modules, or extensive documentation in a single response.

203K Input Context

Enough to handle large codebases or multiple PDFs without splitting the input.

Pricing vs Western Models

$0.40/M input for a model with this output capacity is significantly cheaper than comparable Western alternatives.

Where it falls short

API Latency

Response times spike during peak APAC hours. If your workloads run during those windows, expect inconsistency.

Complex Logic

It struggles with deeply nested multi-step logical deductions. Claude 3.5 Sonnet is noticeably more reliable on hard reasoning tasks.

Best use cases with OpenClaw

Long-form Technical Writing — The 131K output limit lets you generate entire chapters or extensive documentation without hitting a ceiling mid-response.
Large-Scale RAG — 203K context handles large document injections at a price that makes sense for high-frequency pipelines.

Not ideal for

Real-time Chatbots — Latency is too variable for user-facing applications.
Strict Logic Tasks — On ambiguous or complex constraints, it hallucinates more than GPT-4o. Don’t use it for high-stakes reasoning.

Run it through Haimaker

Skip juggling API keys. One Haimaker key gives you access to every model on the platform. Tell OpenClaw:

Add Haimaker as a custom provider to my OpenClaw config. Use these details:

- Provider name: haimaker
- Base URL: https://api.haimaker.ai/v1
- API key: [PASTE YOUR HAIMAKER API KEY HERE]
- API type: openai-completions

Add the auto-router model:
- haimaker/auto (reasoning: false, context: 128000, max tokens: 32000)

Create an alias "auto" for easy switching. Apply the config when done.

Or skip model selection entirely — Haimaker’s auto-router picks the best model for each task so you don’t have to.

OpenClaw setup

Set your base URL to https://api.haimaker.ai/v1 and use the model string z-ai/glm-4.6 in your provider configuration.

{
  "models": {
    "mode": "merge",
    "providers": {
      "z-ai": {
        "baseUrl": "https://api.haimaker.ai/v1",
        "apiKey": "YOUR-Z-AI-API-KEY",
        "api": "openai-completions",
        "models": [
          {
            "id": "glm-4.6",
            "name": "GLM-4.6",
            "cost": {
              "input": 0.4,
              "output": 1.75
            },
            "contextWindow": 202800,
            "maxTokens": 131000
          }
        ]
      }
    }
  }
}

How it compares

vs GPT-4o-mini — 4o-mini is cheaper on input ($0.15/M) but its 16K output cap is nowhere near GLM-4.6’s 131K.
vs Claude 3 Haiku — Haiku is faster for short-burst tasks. GLM-4.6 wins on context window size and output capacity for larger jobs.

Bottom line

The main reason to choose GLM-4.6 is the 131K output limit at this price. If that ceiling doesn’t matter for your use case, there are cheaper options with better reasoning.

TRY GLM-4.6 ON HAIMAKER

For setup instructions, see our API key guide. For all available models, see the complete models guide.