mimo

Xiaomi MiMo Provider for Claude Code

1T MoE model at a fraction of frontier pricing

Quick start claude-multi add mimo

Use cases

High-quality coding at low per-token cost

Large-scale refactoring and code generation

Agentic workflows with extended tool use

Budget-conscious development teams

MiMo-V2.5-Pro is a 1 trillion parameter mixture-of-experts model with 42B active parameters and a 1M token context window. It delivers frontier-class coding performance at a significantly lower cost per token than comparable models. The Anthropic-compatible endpoint means it drops straight into Claude Code.

Model specs

Role	Model	Parameters	Context	Max Output
Primary (Opus/Sonnet)	MiMo-V2.5-Pro	1T MoE (42B active)	1M	128K
Fast (Haiku)	MiMo-V2.5	310B MoE (15B active)	1M	128K

V2.5-Pro handles the heavy reasoning. V2.5 runs background tasks, subagent calls, and quick edits at lower cost. The template maps them to the right roles automatically.

Thinking mode is enabled by default.

Setup

Create an account at xiaomimimo.com and generate an API key
Run the setup command:

claude-multi add mimo

Paste your API key when prompted

The template sets up the base URL, model mappings, and thinking parameters. Your instance is ready to use immediately.

Token Plan alternative

Xiaomi also offers a subscription-based Token Plan with a monthly credit pool. If you prefer predictable monthly costs, use the mimo-token template instead. Note that Token Plan uses different regional endpoints (CN, SG, EU). After setup, edit ~/.claude-<name>/settings.json and replace ANTHROPIC_BASE_URL with the endpoint from your subscription console.

When to pick MiMo

MiMo is the cost-performance leader. If you do a lot of coding and want frontier quality without frontier pricing, MiMo-V2.5-Pro delivers. The 1M context window handles large codebases, and the MoE architecture means you only pay for the active parameters.

For developers who prefer a fixed monthly bill, switch to the MiMo Token Plan template.

Pricing details

Pay-per-token rates are competitive with other frontier providers. The MoE architecture keeps costs down because only 42B of the 1T parameters are active per request. Check xiaomimimo.com for current rates.

DeepSeek - similar positioning, also pay-per-token
Kimi - Moonshot’s agentic coding model
MiniMax - larger output window, higher context

Pricing

Pay-per-token via xiaomimimo.com

Related providers

glm Frontier reasoning models on a fixed monthly plan minimax 1M context window with 512K output tokens deepseek Frontier coding at per-token pricing with a 1M context