Current as of March 2026. Qwen3.5 397B A17B is a heavyweight Mixture-of-Experts model that bridges the gap between open-weight accessibility and frontier-level reasoning. At $0.39 per million input tokens, it offers a massive 262K context window that makes it a viable alternative to GPT-4o for complex agentic workflows.
Specs
| Provider | Qwen (Alibaba) |
| Input cost | $0.39 / M tokens |
| Output cost | $2.34 / M tokens |
| Context window | 262K tokens |
| Max output | 66K tokens |
| Parameters | N/A |
| Features | function_calling, vision, reasoning |
What it’s good at
Superior CJK Performance
It outperforms almost every other model in its class when handling Chinese, Japanese, and Korean technical documentation.
Massive Output Buffer
The 66K max output token limit is rare, allowing for the generation of entire code modules or long-form reports in a single pass.
Deep Reasoning Architecture
The reasoning features are robust enough to handle multi-step logic and complex function calling without losing the instruction chain.
Where it falls short
Inference Latency
With 397B total parameters, the Time To First Token (TTFT) can be sluggish compared to smaller 70B models.
High Output Cost Multiplier
The $2.34 per million output price is nearly six times the input cost, which penalizes verbose agents.
Best use cases with OpenClaw
- Large-Scale Code Refactoring — The 262K context window allows you to dump an entire repository’s worth of context into the prompt for holistic analysis.
- Multilingual Technical Support Agents — It handles nuanced translation and technical jargon in CJK languages better than Llama 3.1 405B.
Not ideal for
- Real-time Chatbots — The model’s size and reasoning overhead make it too slow for snappy, sub-second user interactions.
- Simple Data Extraction — Using a 397B parameter model for basic JSON extraction is a waste of money when Qwen 2.5 7B does it for a fraction of the cost.
OpenClaw setup
Configure your OpenClaw provider to use the Haimaker endpoint at api.haimaker.ai/v1 and set the model ID to qwen/qwen3.5-397b-a17b. Increase your client-side timeout to at least 60 seconds to accommodate the model’s reasoning phase.
{
"models": {
"mode": "merge",
"providers": {
"qwen": {
"baseUrl": "https://api.haimaker.ai/v1",
"apiKey": "YOUR-QWEN-(ALIBABA)-API-KEY",
"api": "openai-completions",
"models": [
{
"id": "qwen3.5-397b-a17b",
"name": "Qwen3.5 397B A17B",
"cost": {
"input": 0.39,
"output": 2.34
},
"contextWindow": 262144,
"maxTokens": 65536
}
]
}
}
}
}
How it compares
- vs Llama 3.1 405B — Llama is more tuned for creative English prose, but Qwen wins on CJK support and offers a larger 66K output limit versus Llama’s 8K.
- vs DeepSeek-V3 — DeepSeek is often cheaper for raw tokens, but Qwen’s vision integration and 262K context window provide more versatility for complex agents.
Bottom line
This is the best high-capacity model for developers who need deep CJK support and a massive context window without paying the premium for closed-source frontier models.
TRY QWEN3.5 397B A17B ON HAIMAKER
For setup instructions, see our API key guide. For all available models, see the complete models guide.