What is the exact pricing for this model?

Input tokens cost $0.39 per million and output tokens cost $2.34 per million.

How much context can it actually handle?

It supports a 262K token context window and can generate up to 66K tokens in a single response.

Does it support tool use?

Yes, it has native support for function_calling, vision, and advanced reasoning tasks.

Qwen3.5 397B A17B for OpenClaw: Pricing, Setup, and What It's Good At

Current as of March 2026. Qwen3.5 397B A17B is a heavyweight Mixture-of-Experts model that bridges the gap between open-weight accessibility and frontier-level reasoning. At $0.39 per million input tokens, it offers a massive 262K context window that makes it a viable alternative to GPT-4o for complex agentic workflows.

Specs


Provider	Qwen (Alibaba)
Input cost	$0.39 / M tokens
Output cost	$2.34 / M tokens
Context window	262K tokens
Max output	66K tokens
Parameters	N/A
Features	function_calling, vision, reasoning

What it’s good at

Superior CJK Performance

It outperforms almost every other model in its class when handling Chinese, Japanese, and Korean technical documentation.

Massive Output Buffer

The 66K max output token limit is rare, allowing for the generation of entire code modules or long-form reports in a single pass.

Deep Reasoning Architecture

The reasoning features are robust enough to handle multi-step logic and complex function calling without losing the instruction chain.

Where it falls short

Inference Latency

With 397B total parameters, the Time To First Token (TTFT) can be sluggish compared to smaller 70B models.

High Output Cost Multiplier

The $2.34 per million output price is nearly six times the input cost, which penalizes verbose agents.

Best use cases with OpenClaw

Large-Scale Code Refactoring — The 262K context window allows you to dump an entire repository’s worth of context into the prompt for holistic analysis.
Multilingual Technical Support Agents — It handles nuanced translation and technical jargon in CJK languages better than Llama 3.1 405B.

Not ideal for

Real-time Chatbots — The model’s size and reasoning overhead make it too slow for snappy, sub-second user interactions.
Simple Data Extraction — Using a 397B parameter model for basic JSON extraction is a waste of money when Qwen 2.5 7B does it for a fraction of the cost.

OpenClaw setup

Configure your OpenClaw provider to use the Haimaker endpoint at api.haimaker.ai/v1 and set the model ID to qwen/qwen3.5-397b-a17b. Increase your client-side timeout to at least 60 seconds to accommodate the model’s reasoning phase.

{
  "models": {
    "mode": "merge",
    "providers": {
      "qwen": {
        "baseUrl": "https://api.haimaker.ai/v1",
        "apiKey": "YOUR-QWEN-(ALIBABA)-API-KEY",
        "api": "openai-completions",
        "models": [
          {
            "id": "qwen3.5-397b-a17b",
            "name": "Qwen3.5 397B A17B",
            "cost": {
              "input": 0.39,
              "output": 2.34
            },
            "contextWindow": 262144,
            "maxTokens": 65536
          }
        ]
      }
    }
  }
}

How it compares

vs Llama 3.1 405B — Llama is more tuned for creative English prose, but Qwen wins on CJK support and offers a larger 66K output limit versus Llama’s 8K.
vs DeepSeek-V3 — DeepSeek is often cheaper for raw tokens, but Qwen’s vision integration and 262K context window provide more versatility for complex agents.

Bottom line

This is the best high-capacity model for developers who need deep CJK support and a massive context window without paying the premium for closed-source frontier models.

TRY QWEN3.5 397B A17B ON HAIMAKER

For setup instructions, see our API key guide. For all available models, see the complete models guide.