The search demand around Gemma 4 is pretty clear: people want the shortest path from “I heard this runs locally” to “my agent is using it without an API bill.”
Ollama is that path. It is not the fastest possible runtime, and it is not where you go if you want to hand-tune every quant. But for a local model that you can install in a few minutes and wire into coding tools, it is the least annoying option.
Quick install
On macOS:
brew install --cask ollama-app
open -a Ollama
ollama pull gemma4
ollama run gemma4:latest "Write a tiny Python function"
On Linux:
curl -fsSL https://ollama.com/install.sh | sh
ollama pull gemma4
ollama run gemma4:latest "Explain what Ollama is doing"
On Windows, install the Ollama app, open PowerShell, and run:
ollama pull gemma4
ollama run gemma4:latest "Hello from Gemma 4"
That is enough to prove the model works. The local API runs at http://localhost:11434.
Use the local API
Most coding agents do not talk to Ollama’s native API directly. They expect an OpenAI-compatible endpoint. Ollama provides one at:
http://localhost:11434/v1
Use any placeholder API key if the tool requires one. Ollama does not validate it locally.
{
"baseURL": "http://localhost:11434/v1",
"apiKey": "ollama",
"model": "gemma4:latest"
}
For OpenClaw specifically, use the dedicated setup guide: Gemma 4 with OpenClaw using Ollama.
For OpenCode, use Ollama with OpenCode.
Keep the model warm
The first request after a model unloads feels slow because Ollama has to load weights back into memory. For a coding assistant, that gets old quickly.
On macOS or Linux:
export OLLAMA_KEEP_ALIVE="-1"
Then restart Ollama. This keeps Gemma 4 loaded instead of unloading it after a few idle minutes.
Pick the right machine
For casual local coding work, 16GB unified memory or 16GB system RAM is the floor. That is enough for the smaller Gemma 4 model and a normal editor.
For bigger variants, give yourself 24GB or more. On a 24GB GPU or a 32GB Mac, local coding agents become much less painful. On weaker hardware, the model may still run, but every tool call feels like waiting for a build that should have been cached.
What works well
Gemma 4 through Ollama is good for:
- Explaining code
- Writing small functions
- Generating config files
- Drafting tests
- Summarizing logs
- Handling private code that should not leave your laptop
It is weaker on long, multi-file refactors. If the task requires keeping a whole system in its head, use Gemma 4 for the first pass and escalate to a stronger cloud model when accuracy matters.
Common fixes
Ollama is not responding
Check that the server is running:
curl http://localhost:11434/api/tags
If that fails, start the app again or run ollama serve.
The agent says the model does not exist
Run:
ollama list
Use the exact model name shown there, usually gemma4:latest.
Responses are too slow
Reduce context size, close memory-heavy apps, and keep the model warm. If you are trying to use a large Gemma 4 variant on a 16GB machine, switch down before blaming the agent.
The honest version
Gemma 4 + Ollama is a very good local setup for routine work. It is private, cheap, and easy to roll back if you do not like it.
It is not a Claude Opus replacement. Treat it as the local model you use first, not the only model you use forever. That one change makes the setup much more useful.