LLM Routing
LLM routing means directing a coding task to the model best suited for it. Some tasks need a massive context window. Others need fast, cheap responses. Routing is how you match the task to the right provider.
How claude-multi handles it
claude-multi doesn’t do automatic routing. Each instance is pinned to a single provider. But because instances are cheap to create and independent, you can run multiple instances side by side and pick the right one for the job.
Common routing patterns:
- Use a GLM instance for general coding (fast, cost-effective)
- Use a MiniMax instance when you need a huge context window (1M tokens)
- Use a DeepSeek instance for deep reasoning tasks
- Use an Anthropic instance when you need the original Claude experience
Why not automatic routing
Automatic routing adds latency, complexity, and a failure point. claude-multi keeps it simple: you create instances for the providers you want, and you pick which one to use. No proxy, no daemon, no middleware deciding for you.
Practical tip
Create a naming convention that makes routing obvious. claude-glm for fast work, claude-minimax for large files, claude-deepseek for complex debugging. The wrapper scripts make this natural.
Related terms
- Provider Template
A pre-configured set of environment variables, base URLs, and model mappings that tells Claude Code how to reach a specific LLM provider.
- Model Mapping
The set of environment variables that tell Claude Code which model to use for each role (Sonnet, Opus, Haiku) when working with a non-Anthropic provider.
- Context Window
The maximum number of tokens a model can process in a single conversation, including both input and output.