OpenCode with Ollama is the setup people want when they are tired of sending every coding prompt to a cloud API. It works. It is also slower and more fragile than the demos make it look.

The right expectation is simple: local OpenCode is excellent for small, private, repetitive work. It is not the model you should trust with a messy multi-file migration unless you enjoy babysitting.

Install Ollama

On macOS:

brew install --cask ollama-app
open -a Ollama

On Linux:

curl -fsSL https://ollama.com/install.sh | sh

Then pull a model:

ollama pull gemma4

Check that it is available:

ollama list

Use the exact model name from that output in your OpenCode config.

Configure OpenCode

OpenCode can talk to OpenAI-compatible providers. Ollama exposes a compatible endpoint at:

http://localhost:11434/v1

Add an Ollama provider in your OpenCode config:

{
  "provider": {
    "ollama": {
      "npm": "@ai-sdk/openai-compatible",
      "options": {
        "baseURL": "http://localhost:11434/v1"
      },
      "models": {
        "gemma4:latest": {}
      }
    }
  }
}

If OpenCode asks for auth, use a placeholder key:

{
  "ollama": {
    "type": "api",
    "key": "ollama"
  }
}

Restart OpenCode and switch to the Ollama model from the model picker.

Models to try first

Gemma 4

Good first pick. It handles explanations, simple edits, and small coding tasks well. Runs on modest machines compared with bigger coding models.

Qwen3.5

Often better for code, especially if you can run a larger variant. The 27B-class models are more useful than tiny models, but they need real memory.

Llama 3.3

Good general model if you have the hardware. Less convenient on smaller laptops.

Performance expectations

Recent local-model threads all say the quiet part out loud: prompts can work, code can be good, and the whole thing can still feel slow once tool calls start stacking up.

That is normal. A coding agent is not a single chat request. It reads files, plans, edits, checks output, and loops. Local inference makes every loop more visible.

To make it tolerable:

  • Keep context small
  • Use smaller models for simple edits
  • Keep the model warm
  • Close memory-heavy apps
  • Use a cloud fallback for long refactors

Keep Ollama warm

export OLLAMA_KEEP_ALIVE="-1"

Restart Ollama after setting it. This avoids repeated cold starts during a coding session.

When to use a cloud fallback

Use local OpenCode for:

  • Reading unfamiliar code
  • Drafting small changes
  • Generating tests
  • Explaining errors
  • Working with private files

Use a cloud model for:

  • Multi-file refactors
  • Hard debugging
  • Architecture changes
  • Anything you do not want to review line by line

The best setup is not local-only. It is local-first.

If your real target is Gemma 4 specifically, read Gemma 4 Ollama setup. If you are using OpenClaw instead of OpenCode, use Gemma 4 with OpenClaw.

ADD A CLOUD FALLBACK WITH HAIMAKER