Back to FAQ
Providers

Can I use it with local models like Ollama?

Yes, as long as your local model server exposes an Anthropic-compatible REST endpoint. Claude Code speaks the Anthropic API protocol, so the server on the other end needs to understand that format.

How to set it up

Create an instance without a template:

Terminal window
claude-multi add local

Then edit ~/.claude-multi/local/settings.json and set the env vars to point at your local server:

{
"env": {
"ANTHROPIC_BASE_URL": "http://localhost:11434/v1",
"ANTHROPIC_MODEL": "your-model-name",
"ANTHROPIC_SMALL_FAST_MODEL": "your-fast-model"
}
}

Replace the URL and model names with whatever your local server exposes.

What works and what doesn’t

If your local server faithfully implements the Anthropic messages API (the /v1/messages endpoint), Claude Code will work with it. Tools like Ollama with an Anthropic-compatible adapter, LiteLLM, or vLLM with the right proxy can bridge the gap.

The further your local setup deviates from the Anthropic API spec, the more likely you are to hit edge cases, especially around streaming, tool use, and extended thinking.

A practical note on cost

Running local models eliminates per-token API costs entirely. You pay in compute (GPU time, electricity) instead. For high-volume tasks like code generation and refactoring, this can be significantly cheaper than any cloud provider, if you have the hardware.

More info