What is the best Ollama model for coding agents?

Start with Qwen3 Coder if your machine can run it. The 30B Ollama model is built for long-context agentic coding tasks, supports a 256K context window, and is far more practical locally than the 480B variant.

What Ollama model should I use on a 16GB laptop?

Use Gemma 4 E4B or Qwen3 8B-class models on 16GB machines. They are not as strong as Qwen3 Coder, but they are realistic for local code explanation, small edits, tests, and config generation.

Are local Ollama models good enough for real coding agents?

They are good for private code review, small edits, boilerplate, test drafts, and repository navigation. For multi-file migrations, hard debugging, or high-stakes changes, keep a stronger cloud model as a fallback.

Best Ollama Models for Coding Agents: Local Models Ranked

The best Ollama model for a coding agent is not always the biggest model you can download. Agents loop. They read files, call tools, revise plans, and generate patches. A model that looks good in a single prompt can feel unusable when every tool call takes another slow local inference pass.

Use this ranking as a practical starting point: which model to pull, what hardware it wants, and when to stop forcing local inference and use a cloud fallback.

Quick ranking

Rank	Model	Pull command	Best for	Practical hardware
1	Qwen3 Coder 30B	`ollama pull qwen3-coder:30b`	Best local coding-agent default	24GB+ VRAM or 32GB+ unified memory
2	Qwen3 30B	`ollama pull qwen3:30b`	General agent tasks, reasoning, code review	24GB+ VRAM or 32GB+ unified memory
3	Gemma 4 26B MoE	`ollama pull gemma4:26b`	Fast local coding help on capable workstations	24GB+ VRAM or 32GB+ unified memory
4	DeepSeek Coder V2 16B	`ollama pull deepseek-coder-v2:16b`	Code completion, familiar coding workflows	16GB+ VRAM or 24GB+ unified memory
5	Gemma 4 E4B	`ollama pull gemma4:e4b`	Lightweight laptop usage	16GB unified memory
6	Qwen3 8B	`ollama pull qwen3:8b`	Small edits and code explanation	8-16GB memory

If you only try one model, try Qwen3 Coder 30B. Ollama lists it as a coding and agentic model with 256K context support, and it is built for the exact shape of work coding agents do: reading code, using tools, and carrying state across longer tasks.

What to pick by machine

8-16GB memory

Use Qwen3 8B or Gemma 4 E4B.

This tier is good for:

Explaining unfamiliar code
Writing small functions
Drafting tests
Generating config files
Summarizing logs

Do not expect reliable multi-file refactors here. Smaller models can write useful code, but they lose the thread quickly once the agent starts opening files, revising patches, and juggling tool output.

24-32GB memory

Use Qwen3 Coder 30B, Qwen3 30B, or Gemma 4 26B MoE.

This is the useful local-agent tier. The model is large enough to follow repository context, but still small enough to run on a serious desktop GPU or a higher-memory Mac. For most developers, this is the point where Ollama stops feeling like a novelty and starts becoming part of the workflow.

64GB+ memory

Try larger Qwen or DeepSeek variants only if you already know why you need them.

The temptation is to chase the biggest model in the library. For agents, that is often the wrong instinct. A huge local model can be technically impressive and still make the coding loop too slow. Bigger is useful when the task genuinely needs more reasoning or context. It is a tax when the agent is only reading files, writing boilerplate, or drafting tests.

Best overall: Qwen3 Coder

Qwen3 Coder is the first model I would test for any local coding-agent setup.

Pull it:

ollama pull qwen3-coder:30b

Run a quick check:

ollama run qwen3-coder:30b "Explain this repo structure and suggest where tests should live."

Why it ranks first:

It is explicitly tuned for coding and agentic workflows.
The Ollama page lists ollama launch support for Claude Code, Codex, OpenCode, and OpenClaw.
The 30B variant is much more realistic locally than the 480B variant.
Long-context support makes it better suited for repository work than older local code models.

Use it for code review, test generation, medium refactors, and local-first agent sessions where privacy matters.

Best lightweight pick: Gemma 4

Gemma 4 is the model family to try when Qwen3 Coder is too heavy. The smaller Gemma 4 variants are designed for local and on-device use, while the 26B MoE variant gives you a stronger workstation option without activating every parameter on every token.

Pull the laptop-friendly version:

ollama pull gemma4:e4b

Pull the stronger workstation version:

ollama pull gemma4:26b

Gemma 4 is a good fit for coding agents when you want fast local help with explanations, small edits, and private code. It is less attractive for long autonomous sessions where the agent needs to keep a full migration plan straight.

Best older code specialist: DeepSeek Coder V2

DeepSeek Coder V2 is older than the Qwen3 Coder generation, but it remains useful because it was trained specifically for code and has a practical 16B variant in Ollama.

Pull it:

ollama pull deepseek-coder-v2:16b

Use it when you want a code-focused local model that fits on more hardware than Qwen3 Coder 30B. It is a reasonable fallback for code completion, simple edits, and single-file work.

Where to use these models

haimaker.ai - use local Ollama models alongside stronger cloud models, then route simple coding-agent work locally and hard tasks to paid models.
OpenClaw - use Ollama as a local provider for coding-agent sessions. See the OpenClaw local model guide.
OpenCode - add Ollama through the OpenAI-compatible provider path. See Ollama with OpenCode.
Codex and Claude Code - Ollama’s model pages include ollama launch examples for agent runtimes, including Codex and Claude Code.

Setup for OpenAI-compatible agents

Most coding agents can use Ollama through its local OpenAI-compatible endpoint:

http://localhost:11434/v1

Use a placeholder API key if your tool requires one. Ollama does not validate it locally.

{
  "baseURL": "http://localhost:11434/v1",
  "apiKey": "ollama",
  "model": "qwen3-coder:30b"
}

For OpenCode, the provider block looks like this:

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "ollama": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "Ollama (local)",
      "options": {
        "baseURL": "http://localhost:11434/v1"
      },
      "models": {
        "qwen3-coder:30b": {
          "name": "Qwen3 Coder 30B"
        }
      }
    }
  }
}

If tool calls are unreliable, reduce the model size first, then increase context only as far as your hardware can handle. A giant context window that swaps memory is worse than a smaller window that stays responsive.

When not to use Ollama

Local models are best when privacy, cost, or offline work matters. They are not automatically better for every coding task.

Use a cloud model when:

The task spans many files
The bug is subtle
You need reliable tool calling
The patch will touch production systems
You do not have time to review every generated line

The practical setup is local-first, not local-only. Use Qwen3 Coder or Gemma 4 for cheap private work, then escalate to a stronger cloud model when the task gets expensive in attention instead of tokens.

ROUTE LOCAL AND CLOUD MODELS WITH HAIMAKER

For OpenClaw-specific local setup, see best local models for OpenClaw. For OpenCode setup, see use Ollama with OpenCode.