Current as of March 2026. GLM-4.6 Exacto is Zhipu’s fine-tuned variant of GLM-4.6, adding tighter instruction following for $0.05/M more on input and $0.15/M more on output. Same 203K context, same 131K output ceiling. The “Exacto” designation refers to improved precision on complex system prompts that the base model tends to drift from.

Specs

ProviderZhipu AI
Input cost$0.45 / M tokens
Output cost$1.90 / M tokens
Context window203K tokens
Max output131K tokens
ParametersN/A
Featuresfunction_calling, reasoning

What it’s good at

131K Output with Better Instruction Adherence

Same large output ceiling as base GLM-4.6, but it tracks complex system prompts more reliably into the response. For agents with elaborate behavioral constraints, the difference is noticeable.

203K Context at a Reasonable Price

$0.45/M for this context window is still cheaper than Claude 3.5 Sonnet ($3/M) for the same size window.

Where it falls short

Still Not a Logic Model

The reasoning is improved over base GLM-4.6 but doesn’t approach GPT-4o or o1 on genuinely hard problems. It can lose the thread on complex multi-step deductions.

Routing Latency

Accessing via Haimaker adds some overhead compared to native Tier-1 provider endpoints. TTFT can run higher than you’d expect.

Best use cases with OpenClaw

  • Long-form Code Generation with Specific Constraints — When you have detailed style guides or architectural rules in your system prompt that the base model tends to ignore.
  • Budget-Conscious RAG — 203K context at $0.45/M for tasks where Claude’s pricing is prohibitive but you need better instruction following than base GLM-4.6.

Not ideal for

  • Low-Latency Chatbots — GLM-4.7 Flash is faster and cheaper for real-time interactions.
  • High-Stakes Financial Logic — The reasoning isn’t rigorous enough for critical mathematical transformations. Use o1 or similar for that.

Run it through Haimaker

Skip juggling API keys. One Haimaker key gives you access to every model on the platform. Tell OpenClaw:

Add Haimaker as a custom provider to my OpenClaw config. Use these details:

- Provider name: haimaker
- Base URL: https://api.haimaker.ai/v1
- API key: [PASTE YOUR HAIMAKER API KEY HERE]
- API type: openai-completions

Add the auto-router model:
- haimaker/auto (reasoning: false, context: 128000, max tokens: 32000)

Create an alias "auto" for easy switching. Apply the config when done.

Or skip model selection entirely — Haimaker’s auto-router picks the best model for each task so you don’t have to.

OpenClaw setup

Set your base URL to api.haimaker.ai/v1 and use the identifier z-ai/glm-4.6:exacto. You must increase your client-side timeout settings to accommodate the potentially massive 131K token responses.

{
  "models": {
    "mode": "merge",
    "providers": {
      "z-ai": {
        "baseUrl": "https://api.haimaker.ai/v1",
        "apiKey": "YOUR-Z-AI-API-KEY",
        "api": "openai-completions",
        "models": [
          {
            "id": "glm-4.6:exacto",
            "name": "GLM-4.6 Exacto",
            "cost": {
              "input": 0.45,
              "output": 1.9
            },
            "contextWindow": 202800,
            "maxTokens": 131000
          }
        ]
      }
    }
  }
}

How it compares

  • vs GLM-4.6 — Same context and output limits, $0.05/$0.15 more per million. Worth it if you’re seeing instruction drift in the base model.
  • vs GLM-4.7 Flash — Flash is faster and cheaper. Exacto is the choice when you need to generate large outputs and the model needs to stay tightly on-spec throughout.

Bottom line

A small premium over base GLM-4.6 for better instruction adherence on long outputs. If instruction drift isn’t a problem for you, save the $0.05/M and use the base model.

TRY GLM-4.6 EXACTO ON HAIMAKER


For setup instructions, see our API key guide. For all available models, see the complete models guide.