What are the token costs for GLM-4.6 Exacto?

Input costs $0.45 per million tokens and output costs $1.9 per million tokens.

What is the maximum response size?

The model can generate up to 131,000 tokens in a single output, which is significantly higher than most US-based models.

GLM-4.6 Exacto for OpenClaw: Pricing, Setup, and What It's Good At

Current as of March 2026. GLM-4.6 Exacto is Zhipu’s fine-tuned variant of GLM-4.6, adding tighter instruction following for $0.05/M more on input and $0.15/M more on output. Same 203K context, same 131K output ceiling. The “Exacto” designation refers to improved precision on complex system prompts that the base model tends to drift from.

Specs


Provider	Zhipu AI
Input cost	$0.45 / M tokens
Output cost	$1.90 / M tokens
Context window	203K tokens
Max output	131K tokens
Parameters	N/A
Features	function_calling, reasoning

What it’s good at

131K Output with Better Instruction Adherence

Same large output ceiling as base GLM-4.6, but it tracks complex system prompts more reliably into the response. For agents with elaborate behavioral constraints, the difference is noticeable.

203K Context at a Reasonable Price

$0.45/M for this context window is still cheaper than Claude 3.5 Sonnet ($3/M) for the same size window.

Where it falls short

Still Not a Logic Model

The reasoning is improved over base GLM-4.6 but doesn’t approach GPT-4o or o1 on genuinely hard problems. It can lose the thread on complex multi-step deductions.

Routing Latency

Accessing via Haimaker adds some overhead compared to native Tier-1 provider endpoints. TTFT can run higher than you’d expect.

Best use cases with OpenClaw

Long-form Code Generation with Specific Constraints — When you have detailed style guides or architectural rules in your system prompt that the base model tends to ignore.
Budget-Conscious RAG — 203K context at $0.45/M for tasks where Claude’s pricing is prohibitive but you need better instruction following than base GLM-4.6.

Not ideal for

Low-Latency Chatbots — GLM-4.7 Flash is faster and cheaper for real-time interactions.
High-Stakes Financial Logic — The reasoning isn’t rigorous enough for critical mathematical transformations. Use o1 or similar for that.

Run it through Haimaker

Skip juggling API keys. One Haimaker key gives you access to every model on the platform. Tell OpenClaw:

Add Haimaker as a custom provider to my OpenClaw config. Use these details:

- Provider name: haimaker
- Base URL: https://api.haimaker.ai/v1
- API key: [PASTE YOUR HAIMAKER API KEY HERE]
- API type: openai-completions

Add the auto-router model:
- haimaker/auto (reasoning: false, context: 128000, max tokens: 32000)

Create an alias "auto" for easy switching. Apply the config when done.

Or skip model selection entirely — Haimaker’s auto-router picks the best model for each task so you don’t have to.

OpenClaw setup

Set your base URL to api.haimaker.ai/v1 and use the identifier z-ai/glm-4.6:exacto. You must increase your client-side timeout settings to accommodate the potentially massive 131K token responses.

{
  "models": {
    "mode": "merge",
    "providers": {
      "z-ai": {
        "baseUrl": "https://api.haimaker.ai/v1",
        "apiKey": "YOUR-Z-AI-API-KEY",
        "api": "openai-completions",
        "models": [
          {
            "id": "glm-4.6:exacto",
            "name": "GLM-4.6 Exacto",
            "cost": {
              "input": 0.45,
              "output": 1.9
            },
            "contextWindow": 202800,
            "maxTokens": 131000
          }
        ]
      }
    }
  }
}

How it compares

vs GLM-4.6 — Same context and output limits, $0.05/$0.15 more per million. Worth it if you’re seeing instruction drift in the base model.
vs GLM-4.7 Flash — Flash is faster and cheaper. Exacto is the choice when you need to generate large outputs and the model needs to stay tightly on-spec throughout.

Bottom line

A small premium over base GLM-4.6 for better instruction adherence on long outputs. If instruction drift isn’t a problem for you, save the $0.05/M and use the base model.

TRY GLM-4.6 EXACTO ON HAIMAKER

For setup instructions, see our API key guide. For all available models, see the complete models guide.