DeepSeek has four models available through the API right now. The lineup is confusing because the naming suggests a linear progression, but each model makes different trade-offs. Here’s what actually matters for OpenClaw.

The quick answer

ModelInput/Output CostContextBest For
DeepSeek V3.2$0.28 / $0.40164KDefault for most tasks
DeepSeek V3.1$0.20 / $0.80164KOutput-heavy generation
DeepSeek R1$0.55 / $2.1965KHard reasoning problems
DeepSeek V3$0.14 / $0.2866KDirt-cheap batch work

Start with V3.2 and only reach for something else when you hit its limits.

DeepSeek V3.2 — the default pick

V3.2 is where most OpenClaw users should land. $0.28/M input, $0.40/M output, 164K context window. It supports function calling and integrated reasoning, which means it can plan and execute multi-step tool chains without you bolting on extra infrastructure.

On SWE-bench, V3.2 scores around 60%. Claude Opus hits 89%, but costs roughly 90x more per token. For the kind of work most people do with OpenClaw — refactoring, debugging, writing tests, generating configs — that gap matters less than you’d think.

V3.2 also introduced thinking-mode tool calling. It reasons internally before deciding which tools to invoke, which makes agentic workflows more reliable than with V3 or V3.1. In the OpenClaw community’s model rankings, V3.2 placed 15th overall across success rate, speed, and cost — solid for something that costs less than a penny per thousand tokens.

Where it struggles: API reliability. DeepSeek’s servers throw 503 errors during peak hours and time-to-first-token can be slow. Build retry logic into your OpenClaw setup. Set your request timeout to at least 60 seconds.

DeepSeek V3.1 — when you’re generating a lot of output

V3.1 is a weird pick in 2026 because V3.2 exists, but there’s one scenario where it wins: output-heavy workloads where you care more about output cost than input cost.

$0.20/M input (cheaper than V3.2) but $0.80/M output (2x V3.2’s output price). The 164K output window is the same. If your agents read a lot of code but generate short responses — think classification, triage, quick edits — V3.1 is actually cheaper than V3.2.

In practice, most OpenClaw workflows generate more output tokens than input tokens, so V3.2 is usually cheaper end-to-end. But if you know your use case is input-heavy, V3.1 saves you money.

Same function calling and reasoning support as V3.2. Same reliability issues.

DeepSeek R1 — the reasoning specialist

R1 is a different animal. 685B parameters, chain-of-thought reasoning, $0.55/M input, $2.19/M output. That’s 5x more expensive than V3.2 on output, but still roughly 20x cheaper than OpenAI’s o1-preview for comparable reasoning quality.

The context situation is tight: 65K input, 8K max output. That output cap means R1 can’t generate an entire refactored file in one pass. It’s a surgeon, not a workhorse.

Reach for R1 when V3.2 keeps getting something wrong. Hard debugging where you need the model to trace through multiple hypotheses, or architecture reviews with competing trade-offs. If you’ve been using Claude Opus for those tasks and want to cut costs, R1 at $0.55/$2.19 is worth trying.

R1 also has the MIT license, so if you eventually want to self-host, you can run a distilled version locally through Ollama. See our local models guide for hardware requirements.

DeepSeek V3 — the budget floor

V3 is the oldest and cheapest: $0.14/M input, $0.28/M output. At those prices you can process millions of tokens without thinking about the bill.

The catch is a 66K context window and 8K max output. No function calling, no reasoning features. It’s a pure chat model — capable on code, but it can’t plan multi-step tool chains the way V3.1+ can.

I’d only use V3 for batch work where you’re processing thousands of small, independent tasks and need the absolute lowest cost. Sentiment analysis, data extraction, simple code classification. For anything that requires actual agent behavior, spend the extra $0.14/M and use V3.2.

The reliability problem

This is the thing nobody at DeepSeek talks about. Every DeepSeek model shares the same API infrastructure, and it’s noticeably less reliable than Anthropic, OpenAI, or Google.

Expect 503 errors during peak hours (roughly 9am-6pm Beijing time). Time-to-first-token is unpredictable. Connection resets happen. If you’re building anything that runs unattended, you need a fallback model configured.

Two ways to handle this in OpenClaw:

  1. Manual fallback: configure both DeepSeek and a second provider (Gemini Flash is a good pairing) and switch when DeepSeek goes down.
  2. Auto-routing: use Haimaker’s auto-router to automatically failover when DeepSeek is slow or unavailable.

Setup in OpenClaw

Running through haimaker.ai

All DeepSeek models are also available through haimaker.ai with a single API key. If you’re already using haimaker for other providers, you can access DeepSeek models without a separate DeepSeek account:

{
  "models": {
    "providers": {
      "haimaker": {
        "baseUrl": "https://api.haimaker.ai/v1",
        "apiKey": "your-haimaker-api-key",
        "api": "openai-completions"
      }
    }
  }
}

This gives you DeepSeek alongside Claude, GPT, Gemini, Grok, and dozens of other models through one provider.

Getting any DeepSeek model running takes about two minutes.

1. Get your DeepSeek API key

Sign up at platform.deepseek.com. No free tier — you need to add credits before making API calls.

2. Add DeepSeek as a provider

Open ~/.openclaw/openclaw.json and add DeepSeek to your providers:

{
  "models": {
    "providers": {
      "deepseek": {
        "baseUrl": "https://api.deepseek.com/v1",
        "apiKey": "your-deepseek-api-key",
        "api": "openai-completions"
      }
    }
  }
}

3. Add models to the allowlist

In the same file, add the models you want to use:

{
  "agents": {
    "defaults": {
      "models": {
        "deepseek/deepseek-v3.2": {},
        "deepseek/deepseek-r1": {}
      }
    }
  }
}

4. Apply the config

Run openclaw gateway config.apply and switch models with /model during a session.

What I’d do

Set V3.2 as your default DeepSeek model. It handles day-to-day coding, tool calling, and agent workflows well enough at $0.28/$0.40. Swap to R1 when something genuinely requires deep reasoning and V3.2 keeps getting it wrong. Ignore V3 and V3.1 unless you have a specific cost-optimization reason to use them.

DeepSeek shouldn’t be your only provider. The API reliability makes it risky as a primary model for anything production-critical. Pair it with something stable like Gemini Flash, Claude Haiku, or GPT-5 Mini, and use DeepSeek as the cost-optimized option when reliability isn’t the constraint.

For a full comparison of all models available in OpenClaw, see our complete models guide. For cost-focused model selection, check out cheapest models for OpenClaw.