Three coding models worth paying attention to: MiMo, Kimi, and Qwen
Xiaomi MiMo-V2.5-Pro, Moonshot Kimi K2.6, and Alibaba Qwen3-Coder-Next are doing real work now. A look at what each is actually good at, and how to wire them up with claude-multi.
Three coding models worth paying attention to: MiMo, Kimi, and Qwen
For a while the choice for serious coding work was Claude or GPT, and everything else was a benchmark chart. That’s not quite true anymore. Three models in particular have crossed the line from “interesting on paper” to “I’d actually use this for work”: Xiaomi’s MiMo-V2.5-Pro, Moonshot’s Kimi K2.6, and Alibaba’s Qwen3-Coder-Next.
None of them replaces Claude Opus across the board. But each of them is genuinely better than the top-tier models at something specific, and they’re cheaper. That makes them worth knowing.
Xiaomi MiMo-V2.5-Pro
A 1T-parameter Mixture-of-Experts model with 42B active. The headline numbers are:
- 1M-token context window. Big enough that “the whole project” stops being a problem.
- 40-60% fewer tokens than Claude Opus 4.6 or Gemini 3.1 Pro for comparable tasks, on the benchmarks Xiaomi published. Token efficiency translates almost directly to cost.
- Open weights, so you can run it locally if you have the hardware.
Xiaomi’s internal demo had MiMo writing a working compiler in under five hours unattended. That’s not the kind of thing that holds up the moment the network is flaky, but it’s a real number to put against the older “AI tools are autocomplete with extra steps” framing.
Use case: long-context refactors where you want the whole repo in scope and you care about token cost.
Moonshot Kimi K2.6
Engineered for agentic work specifically. The relevant numbers:
- 12-hour autonomous sessions. Not a marketing number, an architectural one: the runtime is built to keep state coherent across that duration.
- Native primitives for spawning, scheduling, and reconciling up to 300 sub-agents in a swarm. If you’re trying to parallelize work across many sub-tasks, this is the model with the explicit support for it.
- 262K context with auto-compression. Smaller window than MiMo, but compression handles the overflow cleanly.
It leads most of the current agentic benchmarks: SWE-Bench Pro, Terminal-Bench 2.0. If you’re running long-horizon agents over a real codebase, this is the one to try first.
Use case: anything where the agent has to run for hours, manage many sub-tasks, and not lose the thread.
Alibaba Qwen3-Coder-Next
The dedicated coding branch of the Qwen3 family. What stands out:
- Fine-tuned specifically for coding, which shows up most clearly on small, focused tasks where the general-purpose models still occasionally hallucinate API signatures.
- A tiered lineup that maps cleanly to Anthropic’s:
qwen3-coder-nextfor hard problems,qwen3-coder-plusfor the middle,qwen3-coder-flashfor cheap small calls. - Strong open-source adoption, which means more community tooling, more shared prompts, more debuggable behavior.
Use case: high-volume coding work where you want a tier that matches the difficulty of the task.
What this means in practice
You don’t pick one of these and replace everything. You pick a default and use the others where they’re better. Most of the cost savings people are seeing come from routing: Opus for the hard reasoning, Qwen Flash or DeepSeek for the small edits and lookups, Kimi when you actually need an agent that runs for an afternoon.
The hard part is the plumbing. Each provider has its own base URL, its own model identifiers, sometimes split between pay-per-token and subscription endpoints. That is the part that gets old fast.
Using them with claude-multi
This is the problem claude-multi exists to solve. It gives each provider its own alias (claude-mimo, claude-kimi, claude-qwen) with its own config directory. You pick a template, paste a key, and you’re done.
A few specifics:
- MiMo and Qwen both have split endpoints for API vs subscription plans. claude-multi has separate templates for each (
mimoandmimo-token,qwenandqwen-coding) so the right key hits the right endpoint without you reading three docs sites. - If you wire in a router (the
claude-code-llm-routerMCP server, for example), claude-multi instances become the layer it routes across. Cheap models for small calls, premium models for hard ones, all under the same Claude Code surface.
None of this is exotic. It is just the config plumbing you’d write yourself if you had the time, with the rough edges already filed off.
References
- Xiaomi MiMo-V2.5-Pro Official Page: https://mimo.xiaomi.com/mimo-v2-5-pro/
- Moonshot Kimi K2.6 - Agentic Coding AI: https://kimi-k2.org/kimi-k26
- The Decoder: Xiaomi’s open-weight MiMo-V2.5-Pro takes aim at Claude Opus with hours-long autonomous coding: https://the-decoder.com/xiaomis-open-weight-mimo-v2-5-pro-takes-aim-at-claude-opus-with-hours-long-autonomous-coding/