OpenCode is a terminal-based coding assistant that talks to any OpenAI-compatible API. Point it at a local Ollama instance running Gemma 4 and you’ve got a free coding assistant that never sends your code anywhere.

Here’s how to set it up on a Mac with Apple Silicon: install Ollama, pull Gemma 4, wire it into OpenCode.

What you need

  • Mac with Apple Silicon (M1/M2/M3/M4/M5) and at least 16GB unified memory
  • macOS with Homebrew installed
  • OpenCode installed (see opencode.ai or install via your package manager)

Gemma 4’s default 8B model uses about 9.6GB loaded, so 16GB of unified memory gives you enough room to run both Ollama and OpenCode without issues.

Step 1: Install Ollama

brew install --cask ollama-app

This installs Ollama.app in /Applications/ and the ollama CLI at /opt/homebrew/bin/ollama.

Step 2: Start Ollama

open -a Ollama

Wait for the menu bar icon to appear, then verify the server is running:

ollama list

Step 3: Pull Gemma 4

ollama pull gemma4

Downloads about 9.6GB. Verify:

ollama list
# NAME             ID              SIZE      MODIFIED
# gemma4:latest    ...             9.6 GB    ...

Test it:

ollama run gemma4:latest "Hello, what model are you?"

Check GPU acceleration:

ollama ps
# Should show CPU/GPU split, e.g. 14%/86% CPU/GPU

Ollama v0.19+ on Apple Silicon uses Apple’s MLX framework automatically for faster inference.

Step 4: Configure OpenCode to use Gemma 4

OpenCode uses a config file at ~/.config/opencode/opencode.jsonc. Add Ollama as a custom provider:

{
  "provider": {
    "ollama": {
      "npm": "@ai-sdk/openai-compatible",
      "options": {
        "baseURL": "http://localhost:11434/v1"
      },
      "models": {
        "gemma4:latest": {}
      }
    }
  }
}

Since Ollama runs locally, you don’t need an API key. But OpenCode expects an auth entry, so add a placeholder to ~/.local/share/opencode/auth.json:

{
  "ollama": {
    "type": "api",
    "key": "ollama"
  }
}

Restart OpenCode and use /models to switch to ollama/gemma4:latest.

Step 5: Keep Gemma 4 loaded

Ollama unloads models after 5 minutes of idle time by default. For a coding assistant you’re using throughout the day, that means unnecessary cold starts.

Set keep-alive to indefinite:

launchctl setenv OLLAMA_KEEP_ALIVE "-1"

Restart Ollama for this to take effect. To persist across reboots, add to ~/.zshrc:

export OLLAMA_KEEP_ALIVE="-1"

Enable launch at login: click the Ollama menu bar icon → Launch at Login.

Auto-preload on startup

Create a launch agent so Gemma 4 is warm and ready after every reboot:

cat << 'EOF' > ~/Library/LaunchAgents/com.ollama.preload-gemma4.plist
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.ollama.preload-gemma4</string>
    <key>ProgramArguments</key>
    <array>
        <string>/opt/homebrew/bin/ollama</string>
        <string>run</string>
        <string>gemma4:latest</string>
        <string></string>
    </array>
    <key>RunAtLoad</key>
    <true/>
    <key>StartInterval</key>
    <integer>300</integer>
    <key>StandardOutPath</key>
    <string>/tmp/ollama-preload.log</string>
    <key>StandardErrorPath</key>
    <string>/tmp/ollama-preload.log</string>
</dict>
</plist>
EOF

launchctl load ~/Library/LaunchAgents/com.ollama.preload-gemma4.plist

This pings the model every 5 minutes with an empty prompt to keep it in memory.

What works well with Gemma 4 in OpenCode

Gemma 4 8B is free and local, and it’s surprisingly useful for everyday coding work:

  • Code explanations. Ask what a function does, how a module is structured, or what a regex matches. Answers are clear and usually accurate for standard codebases.
  • Quick edits. Fix a typo, update an import, add a field to a type definition, rename a variable. Single-file changes are its sweet spot.
  • Boilerplate generation. Config files, test stubs, API route scaffolding, Dockerfile templates. Common patterns that don’t require much reasoning.
  • Shell command help. Forgot a git flag or a jq filter? Gemma 4 gives you the command without a round trip to Stack Overflow.

Where it falls short

  • Multi-step reasoning. Tasks that require planning across multiple files or understanding complex control flow tend to produce incomplete results.
  • Large refactors. If you need coordinated changes across a codebase, the 8B model loses coherence. It works file by file but doesn’t keep the big picture.
  • Edge cases and subtle bugs. Gemma 4 catches obvious issues but misses the kind of bugs that require deep domain knowledge or reasoning through corner cases.

Go further: add Haimaker for cloud models

Gemma 4 locally covers the basics. When you hit something it can’t handle — complex debugging, multi-file refactors, anything requiring deep reasoning — you want a cloud model. Haimaker gives you one API key for Claude Opus, GPT-5, Gemini Pro, and others.

Add Haimaker as a second provider alongside Ollama:

{
  "provider": {
    "ollama": {
      "npm": "@ai-sdk/openai-compatible",
      "options": {
        "baseURL": "http://localhost:11434/v1"
      },
      "models": {
        "gemma4:latest": {}
      }
    },
    "haimaker": {
      "npm": "@ai-sdk/openai-compatible",
      "options": {
        "baseURL": "https://api.haimaker.ai/v1"
      },
      "models": {
        "anthropic/claude-sonnet-4-6": {},
        "openai/gpt-5": {},
        "google/gemini-2.5-pro": {}
      }
    }
  }
}

Add your Haimaker API key to ~/.local/share/opencode/auth.json:

{
  "ollama": {
    "type": "api",
    "key": "ollama"
  },
  "haimaker": {
    "type": "api",
    "key": "YOUR_HAIMAKER_API_KEY"
  }
}

Now you can switch between local and cloud models with /models in OpenCode. Use Gemma 4 for the quick stuff. Switch to Sonnet or GPT-5 when the task gets hard.

Sign up at haimaker.ai to get your API key and browse the model catalog.

GET YOUR HAIMAKER API KEY

Troubleshooting

Provider not showing up in /models. Restart OpenCode after editing config files. Changes to opencode.jsonc aren’t picked up while OpenCode is running.

“Model not found” error. Make sure the model ID in your config matches exactly what Ollama reports. Run ollama list and use the name as shown — typically gemma4:latest.

Authentication errors with Ollama. Even though Ollama doesn’t need auth, OpenCode’s provider system expects an entry in auth.json. The placeholder "key": "ollama" is enough.

Slow responses. Make sure you’re on Ollama v0.19+ to get MLX acceleration on Apple Silicon. Run ollama --version to check. Also close apps that compete for unified memory — browsers with many tabs are the usual culprit.

Context window issues. Gemma 4 supports large context windows, but on 16GB hardware, keep inputs under 32K tokens for stable output quality. If you notice degraded responses on long prompts, that’s probably why.

Useful Ollama commands

CommandDescription
ollama listList downloaded models
ollama psShow running models and memory usage
ollama run gemma4:latestInteractive chat
ollama stop gemma4:latestUnload model from memory
ollama pull gemma4:latestUpdate to latest version
ollama rm gemma4:latestDelete model

Already using Haimaker with OpenCode? See the full custom provider setup guide for adding more models.