<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>claude-multi Blog</title><description>Build notes from working on claude-multi, posts on running multiple Claude Code instances side by side, and the occasional rant about provider plumbing.</description><link>https://claude-multi.hmziq.xyz/</link><language>en-us</language><lastBuildDate>Wed, 27 May 2026 00:00:00 GMT</lastBuildDate><atom:link href="https://claude-multi.hmziq.xyz/feed.xml" rel="self" type="application/rss+xml" xmlns:atom="http://www.w3.org/2005/Atom"/><item><title>v0.8.1: Fixing the Broken npm Package</title><link>https://claude-multi.hmziq.xyz/blog/v081-fixing-the-broken-npm-package/</link><guid isPermaLink="true">https://claude-multi.hmziq.xyz/blog/v081-fixing-the-broken-npm-package/</guid><description>If you installed claude-multi globally and got Module not found, this is why. The CLI build and the docs site were both outputting to dist/. The docs won. Now they don&apos;t.</description><pubDate>Wed, 03 Jun 2026 00:00:00 GMT</pubDate><content:encoded># v0.8.1: Fixing the Broken npm Package

You installed `claude-multi` globally. You ran it. You got this:

```
error: Module not found &quot;/Users/.../claude-multi/bin/../dist/cli.js&quot;
```

That&apos;s not a missing dependency. The file was never there.

## Two builds, one directory

claude-multi has two build steps. The CLI compiles from TypeScript into a single JS bundle. The documentation site compiles from Astro into static HTML. We had both outputting into `dist/`.

The build script in `package.json`:

```json
&quot;build&quot;: &quot;bun build src/cli.ts --outdir dist --target node --format esm&quot;
```

The docs build in `astro.config.mjs` uses Astro&apos;s default output directory, which is also `dist/`.

When the package got published, the `files` field said to include `dist/cli.js`. But `dist/` was full of HTML files from the docs build: `index.html`, `_astro/`, fonts, images, sitemap XML. The whole rendered site. No `cli.js` anywhere. The shell wrapper tried to load `$DIR/../dist/cli.js`, found HTML instead, and threw a module resolution error.

## The fix

Moved the CLI build output from `dist/` to `build/`. Three files changed in the source, two more in CI.

`package.json` now says `--outdir build` and lists `build/cli.js` in the `files` array. The shell wrapper in `bin/claude-multi.js` resolves `../build/cli.js` instead of `../dist/cli.js`. The CI workflows that verify the build artifact now check `build/cli.js`.

Astro keeps `dist/`. The CLI gets its own directory. They don&apos;t touch each other.

## Upgrading

If you&apos;re on 0.8.0 and hitting the error:

```
bun add -g claude-multi@0.8.1
```

The tarball now contains exactly seven files. No docs artifacts leaking in.

---

Full changelog: [CHANGELOG.md](https://github.com/hmziqrs/claude-multi/blob/master/CHANGELOG.md). Provider reference: [/docs/providers/](/docs/providers/).</content:encoded><category>release</category><category>bugfix</category><category>build</category><author>hmziqrs</author><enclosure url="https://claude-multi.hmziq.xyz/audio/https://raw.githubusercontent.com/hmziqrs/claude-multi/master/audio/v081-fixing-the-broken-npm-package.mp3" length="0" type="audio/mpeg"/></item><item><title>v0.8.2: Granular Sync Modes and Responsive Web</title><link>https://claude-multi.hmziq.xyz/blog/v082-granular-sync-modes-responsive-web/</link><guid isPermaLink="true">https://claude-multi.hmziq.xyz/blog/v082-granular-sync-modes-responsive-web/</guid><description>claude-multi had two sync modes: symlink everything or copy everything. Now there&apos;s a middle ground. Plus a responsive header, sticky sidebar, and SSR footer for the docs site.</description><pubDate>Wed, 03 Jun 2026 00:00:00 GMT</pubDate><content:encoded># v0.8.2: Granular Sync Modes and Responsive Web

Auto-sync was a binary choice. Either `plugins/` and `skills/` were symlinked to `~/.claude` (auto-sync on), or they were independent copies (auto-sync off). Two options. Pick one.

The problem with auto-sync: every instance sees every plugin you install in `~/.claude`, immediately. You install something in your default Claude, and every `claude-*` alias gets it whether you want it or not.

The problem with manual mode: you have to copy new plugins by hand every time. Install a plugin in `~/.claude`, then manually copy it to each instance that needs it. Tedious.

## Three modes instead of two

Auto works the same as before. The entire `plugins/` and `skills/` directories are symlinked to `~/.claude/plugins` and `~/.claude/skills`. Any change in `~/.claude` is instantly visible to the instance.

Full-manual works the same as the old &quot;manual&quot; mode. Independent copies of everything. No symlinks.

Half-manual is the new one. The `plugins/` and `skills/` directories are real directories, not symlinks. But each plugin and skill inside them is individually symlinked back to `~/.claude`. You get the existing plugins from your default installation, but new installs in `~/.claude` don&apos;t automatically appear. You control what shows up.

The function behind this is `halfSyncPluginsAndSkills()` in `config.ts`. When switching from auto to half-manual, it removes the whole-directory symlink, creates a real directory in its place, then iterates over every item in `~/.claude/plugins` and `~/.claude/skills` and creates individual relative symlinks for each one. Items that already exist in the instance directory are not overwritten. If you added your own plugin to the instance, it stays. If you later install a new plugin in `~/.claude`, it does not appear until you explicitly re-sync.

Re-syncing is available in the TUI. The ToggleAutoSync screen now has a &quot;Force re-sync&quot; option for auto and half-manual modes. It rebuilds the symlinks without changing the mode.

## Downgrade only

You can go auto to half-manual to full-manual. You cannot go back up. Going back up would require reconciling diverged directories, which is a data loss problem. If the instance has its own plugins that don&apos;t exist in `~/.claude`, going back to auto-sync would lose them under a symlink.

`canConvertSyncMode()` in `constants.ts` enforces this. It uses an ordered array and checks that the target mode has a higher index than the current one.

## CLI and TUI changes

The `add` command got new flags:

```
claude-multi add my-instance --sync-mode half-manual
claude-multi add my-instance --half-manual
claude-multi add my-instance --auto-sync
claude-multi add my-instance --manual
```

`--sync-mode` accepts `auto`, `half-manual`, or `full-manual`. The other flags are shortcuts. Specifying more than one is an error.

The `auto-sync` command now accepts mode names:

```
claude-multi auto-sync my-instance half-manual
```

Legacy `on`/`off` still works. `on` maps to `auto`, `off` maps to `full-manual`.

The TUI ToggleAutoSync screen is now a Sync Mode screen. Current mode gets a color label (green for auto, cyan for half-manual, yellow for full-manual), and it shows which downgrades are available. `plugin install/remove` is blocked when the instance is in half-manual mode, because individually symlinked plugins can&apos;t be individually managed.

## The web side

Five files changed on the docs site.

The header got a hamburger dropdown. Below 60rem, the nav links disappear and a `nav-dropdown` custom element takes over. Frosted glass panel (`backdrop-filter: blur(20px) saturate(140%)`), Escape to close, click-outside to close, proper ARIA menu roles. On desktop, nothing changed.

The sidebar is sticky now. On desktop (72rem+), the `.page` container is a CSS grid where the sidebar and main content overlap in the same grid cell. The sidebar uses `position: sticky` instead of fixed. The difference: a sticky element stops at the boundary of its scroll container, so the sidebar scrolls with the page until it hits the footer, then stops. No overlap. On mobile and tablet, Starlight&apos;s default layout is untouched.

The footer used to be trapped inside Starlight&apos;s `.page` wrapper, sitting in the sidebar column instead of spanning the viewport. The fix moves it outside `.page` entirely. Dev mode uses Astro middleware, the static build uses an integration `buildDone` hook. The footer HTML comes from `site-footer-html.ts`, shared between marketing and docs pages.

Both the header and the right sidebar (table of contents) pin with `position: sticky`. Header at `top: 0`, TOC at `top: var(--sl-nav-height)`.

## Backward compat

Existing instances keep their current behavior. The config has a new `syncMode` field, but `getSyncMode()` falls back to the old `autoSync` boolean if it&apos;s not set. `autoSync: false` resolves to `full-manual`. Everything else resolves to `auto`. Both fields are written on update so the config works with old and new versions.

## Upgrading

```
bun add -g claude-multi@0.8.2
```

Existing instances keep their current mode. To switch:

```
claude-multi auto-sync my-instance half-manual
```

22 files changed. 17 for sync modes, 5 for the web. The `SyncMode` enum, `canConvertSyncMode()`, `availableSyncModeConversions()`, and `halfSyncPluginsAndSkills()` are new exports in `constants.ts`.

---

Full changelog: [CHANGELOG.md](https://github.com/hmziqrs/claude-multi/blob/master/CHANGELOG.md). Provider reference: [/docs/providers/](/docs/providers/).</content:encoded><category>release</category><category>sync</category><category>plugins</category><category>web</category><category>responsive</category><author>hmziqrs</author><enclosure url="https://claude-multi.hmziq.xyz/audio/https://raw.githubusercontent.com/hmziqrs/claude-multi/master/audio/v082-granular-sync-modes-responsive-web.mp3" length="0" type="audio/mpeg"/></item><item><title>MiniMax M3 for Claude Code: 1M Context, Benchmarks, Pricing</title><link>https://claude-multi.hmziq.xyz/blog/minimax-m3-one-million-context-frontier-coding/</link><guid isPermaLink="true">https://claude-multi.hmziq.xyz/blog/minimax-m3-one-million-context-frontier-coding/</guid><description>MiniMax M3 brings a 1M-token context window, frontier coding scores, and native multimodality to Claude Code. Updated template, benchmarks vs Opus 4.7, and pricing.</description><pubDate>Mon, 01 Jun 2026 00:00:00 GMT</pubDate><content:encoded>MiniMax released M3 yesterday. The short version: 1M-token context window, frontier coding scores, native image and video input, toggleable thinking. The claude-multi `minimax` template now uses it. Existing instances can sync with `claude-multi doctor fix`.

The longer version is worth reading. M3 clears Opus 4.7 on three benchmarks and costs about 1/30th the price. It is also the first open-weight model to ship with a 1M context window, frontier-level coding, and native multimodality all at once.

## What changed in the template

The model names in the `minimax` template went from `MiniMax-M2.7` to `MiniMax-M3` across all model slots (opus, sonnet, haiku, small/fast). If you created a MiniMax instance before this update, running `claude-multi doctor fix` will sync the new template automatically ([how template sync works](/blog/v064-provider-template-sync/)). Three things are different this time:

**No more auto-compaction override.** M2.7 had a 128K context window. Claude Code assumes 200K for unrecognized models, which meant auto-compaction never fired and context would fill to 100% before crashing. M3 has a 1M context window. The override is gone.

**Output tokens up to 512K.** M2.7 capped at 64K. M3 supports up to 512K output tokens, which matters for the long-horizon agentic tasks the model is built for.

**Effort level set to `max`.** M3 supports toggleable thinking. The template enables thinking with `REASONING_EFFORT: &quot;high&quot;` and sets `CLAUDE_CODE_EFFORT_LEVEL: &quot;max&quot;`.

## The M3 architecture: MSA

The headline technical feature is MiniMax Sparse Attention (MSA). Standard attention scales quadratically with context length. MSA replaces full attention with KV-block selection, where an index branch scores blocks of key-value pairs and a sparse branch only computes attention on the selected blocks.

The practical result: at 1M tokens, M3&apos;s per-token compute is 1/20th of the previous generation. Prefilling is 9x faster. Decoding is 15x faster. MiniMax claims MSA matches full attention on the vast majority of capabilities across their ablations.

M3 was designed around sparse attention from the start. The compute cost does not explode at long context the way it does with full attention, which is what makes the 1M window usable in practice rather than a spec sheet number.

## MiniMax M3 benchmarks

The coding and agentic benchmarks are what matter most for Claude Code users.

### Coding

| Benchmark | M3 | Opus 4.7 | GPT-5.5 | Gemini 3.1 Pro | M2.7 |
|---|---|---|---|---|---|
| SWE-Bench Pro | 59.0 | **64.3** | 58.6 | 54.2 | 56.2 |
| SWE-Bench Verified | 80.5 | **87.6** | 82.9 | 80.6 | 79.9 |
| Terminal-Bench 2.1 | 66.0 | 66.1 | **78.2** | 70.3 | 51.1 |
| SVG-Bench | **63.7** | 62.3 | 58.2 | 59.2 | 48.0 |
| KernelBench Hard | 28.8 | **30.7** | 20.9 | 18.6 | 10.5 |
| PaperBench | 52.6 | **58.5** | 57.5 | 46.7 | 30.6 |

M3 beats GPT-5.5 on SWE-Bench Pro (59.0 vs 58.6) and edges past Opus 4.7 on SVG-Bench (63.7 vs 62.3). Opus still leads the main SWE-Bench scores. But M3 went from mid-pack with M2.7 to second place on most coding benchmarks, and the gap to Opus is narrower than the gap between Opus and the rest.

### Agentic

| Benchmark | M3 | Opus 4.7 | GPT-5.5 | M2.7 |
|---|---|---|---|---|
| Claw-Eval | **74.5** | 71.6 | -- | 49.7 |
| MCP Atlas | 74.2 | **77.0** | 75.3 | 49.4 |
| DRACO | 73.2 | **77.7** | -- | 66.8 |
| BankerToolBench | 76.1 | **81.3** | 70.0 | 63.9 |

Claw-Eval is the end-to-end autonomous agent evaluation. M3 takes the top spot at 74.5, ahead of Opus 4.7 at 71.6. This is the benchmark that most closely matches what Claude Code does: sustained multi-step tool use in a real environment. M3 was trained for multi-turn production-like collaboration using an interactive user-simulator framework.

### Multimodal

| Benchmark | M3 | Opus 4.7 | GPT-5.5 | Gemini 3.1 Pro |
|---|---|---|---|---|
| OmniDocBench | **91.6** | 89.3 | 87.5 | 88.1 |
| MMMU-Pro | 78.1 | 77.0 | **81.2** | 80.5 |
| Video-MMMU | 84.6 | 83.0 | 86.4 | **87.9** |

OmniDocBench measures multimodal document understanding across text, tables, charts, and images. M3 leads at 91.6. If you use Claude Code for document-heavy workflows, M3 can ingest the paper, figures, tables, and formulas all at once within its 1M context window.

### The jump from M2.7

The upgrade from M2.7 to M3 is massive. On SWE-Bench Pro, M3 jumps from 56.2 to 59.0. On KernelBench Hard, it nearly triples from 10.5 to 28.8. On Claw-Eval, it goes from 49.7 to 74.5. On SVG-Bench, from 48.0 to 63.7. PaperBench goes from 30.6 to 52.6. Across every benchmark, M3 is a different class of model than M2.7.

## MiniMax M3 pricing and Token Plans

Through the MiniMax API:

| Tier | Input (per 1M) | Output (per 1M) | Context |
|---|---|---|---|
| Standard (up to 512K) | $0.60 | $2.40 | up to 512K |
| Long context (512K to 1M) | $1.20 | $4.80 | 512K to 1M |
| Cache read | $0.12 | -- | -- |

For comparison, Claude Opus 4.7 runs about $15/M input and $75/M output. M3 at $2.40/M output is roughly 1/30th the cost. Even at the long-context tier ($4.80/M output), it is still a fraction of what Opus charges.

MiniMax also offers Token Plans (subscription):
- **Plus**: $20/month for about 1.7 billion tokens
- **Max**: $50/month for about 5.1 billion tokens
- **Ultra**: $120/month for about 9.8 billion tokens

Both Token Plan and pay-per-token use the same `api.minimax.io` endpoint. The API key type determines which quota is consumed.

## Real-world demonstrations

MiniMax published three extended task runs that show what 1M context plus frontier coding looks like in practice.

**CUDA kernel optimization.** M3 optimized an FP8 GEMM kernel on NVIDIA Hopper GPUs over 24 hours. 147 submissions, 1,959 tool calls. Hardware peak utilization went from 7.6% to 71.3%, a 9.4x speedup. Most models stopped improving within 30 submissions. M3&apos;s best solution showed up on submission 145. The tool call history gets dense and structured fast, and MSA&apos;s sparse attention keeps the model focused on what matters as the conversation grows.

**Paper reproduction.** M3 autonomously reproduced an ICLR 2025 Outstanding Paper over 12 hours. 18 commits, 23 experimental figures. The paper&apos;s text, formulas, and figures all fit in context at once. The multimodal input handled the curves and charts natively.

**Training models from scratch.** On PostTrainBench, M3 was given four base models and told to synthesize data, train, evaluate, and iterate, all without human intervention. It scored 0.37, compared to Opus 4.7 at 0.42 and GPT-5.5 at 0.39.

## Getting started

**New instance:**

```bash
claude-multi add minimax --provider minimax --api-key sk-...
```

**Existing instance (sync to M3):**

```bash
claude-multi doctor check   # see what needs updating
claude-multi doctor fix     # re-applies the latest template
```

Or from the TUI: press `!` to open the health screen, then `f` to fix. The sync preserves your API key and any custom env vars you set.

---

Provider reference with model mappings, endpoints, and plan notes: [/docs/providers/](/docs/providers/). Environment variable reference: [/docs/environment-variables/](/docs/environment-variables/). Template source: [src/templates.ts](https://github.com/hmziqrs/claude-multi/blob/master/src/templates.ts). For background on how the minimax template was originally added, see [Five New Provider Templates](/blog/five-new-provider-templates/).</content:encoded><category>providers</category><category>models</category><category>minimax</category><category>benchmark</category><category>claude-code</category><author>hmziqrs</author><enclosure url="https://claude-multi.hmziq.xyz/audio/https://raw.githubusercontent.com/hmziqrs/claude-multi/master/audio/minimax-m3-one-million-context-frontier-coding.mp3" length="0" type="audio/mpeg"/></item><item><title>v0.6.5: Action Buttons, Health Screen Fix, Hardened Migrations</title><link>https://claude-multi.hmziq.xyz/blog/v065-instance-actions-health-fix/</link><guid isPermaLink="true">https://claude-multi.hmziq.xyz/blog/v065-instance-actions-health-fix/</guid><description>Instance details now has action buttons to update settings templates and regenerate alias wrappers. The health screen got a rewrite to fix broken dismiss actions. Migrations are more resilient with stale lock detection and atomic health status writes.</description><pubDate>Sat, 30 May 2026 00:00:00 GMT</pubDate><content:encoded># v0.6.5: Action Buttons, Health Screen Fix, Hardened Migrations

Instance details grew action buttons. The health screen had broken dismiss actions, now fixed. And migrations got more resilient to crashes and race conditions.

## Action buttons in instance details

The instance details screen was a read-only wall of text. You could see your instance&apos;s config, version, plugins, MCP servers. But to do anything, you had to go back to the main menu and find the right screen.

Now when you open an instance&apos;s details and press Enter, you get an actions menu:

- Update settings template: re-syncs the provider template env vars (model names, thinking limits, base URL) to whatever the latest template defines. Your API key and custom tunable vars are preserved.
- Update alias wrapper: regenerates the wrapper script at `~/.local/bin/claude-&lt;name&gt;` to match the current standard.
- Override alias to standard: only shows up when the wrapper is mismatched or missing. Force-regenerates it.

Each action shows a live status indicator next to it. ✓ up to date, ⚠ mismatch detected, ✗ wrapper missing. You can tell whether an instance needs attention before picking an action.

This is the per-instance equivalent of `claude-multi doctor fix`. Targeted at a single instance, with visual feedback about what&apos;s wrong.

## Health screen: dismiss actually works now

The health screen (press `!` from the main menu) had a bug where pressing `d` to dismiss an issue did nothing. The handler checked a `selectedIssue` variable that was never set. There was no way to select an issue from the list. Dead code the whole time.

The fix replaces the old static issue cards with a Select menu. You pick an issue with arrow keys and Enter, see its full details (severity, instance name, resolution hint), then press `d` to dismiss it. `D` still dismisses all at once.

There was also a bug where the &quot;Instance migrations pending&quot; warning kept showing after running `doctor fix`. The migration saved the updated config to disk, but the app&apos;s in-memory state wasn&apos;t refreshed before health checks re-ran. The fix: call `reload()` after migration, so the health check sees the fresh config version and stops reporting a stale warning.

## Migration hardening

The migration system got three infrastructure fixes.

Stale lock detection. If claude-multi crashes mid-migration (SIGKILL, power loss, whatever), a lock file stays behind at `~/.claude-multi/.migration.lock`. The old code checked whether the PID in the lock was still running, but PIDs get recycled on long-running systems. Now there&apos;s a 30-minute staleness check: if the lock is older than half an hour, it gets removed regardless of what the PID check says.

Atomic health status writes. The health status file at `~/.claude-multi/health-status.json` was written with plain `writeFileSync`. If the process crashed mid-write, or if two claude-multi instances wrote at the same time, the file could end up corrupted. Now it writes to a `.tmp` file first, verifies the JSON parses, then renames it into place.

Doctor fix double-fire guard. Pressing `f` on the health screen calls `handleDoctorFix`, which is async. If you hit `f` twice fast, the function would run twice concurrently. The second run was harmless (migrations are idempotent), but it caused an unnecessary reload and a misleading result screen. Now there&apos;s a `doctorRunning` flag that prevents concurrent invocations.

## Under the hood

`syncProviderTemplateForInstance()` in `config.ts` is the on-demand version of the v0.6.3 migration logic. Takes an instance, detects its provider, re-applies the latest template while preserving API key and tunable env vars.

`detectTemplateMismatch()` and `detectWrapperMismatch()` in `instance-diagnostics.ts` are pure functions that compare an instance&apos;s current state against the expected template. The UI uses them to show mismatch indicators.

`TUNABLE_ENV_VARS` moved to `constants/env.ts`. It&apos;s shared between diagnostics (excluded from comparison) and sync (preserved during update) so they can&apos;t drift apart.

---

Full changelog: [CHANGELOG.md](https://github.com/hmziqrs/claude-multi/blob/master/CHANGELOG.md). Provider reference: [/docs/providers/](/docs/providers/).</content:encoded><category>release</category><category>health</category><category>migration</category><category>ui</category><category>hardening</category><author>hmziqrs</author><enclosure url="https://claude-multi.hmziq.xyz/audio/https://raw.githubusercontent.com/hmziqrs/claude-multi/master/audio/v065-instance-actions-health-fix.mp3" length="0" type="audio/mpeg"/></item><item><title>v0.7.0: No More Pinned Claude Binary</title><link>https://claude-multi.hmziq.xyz/blog/v070-no-more-pinned-claude/</link><guid isPermaLink="true">https://claude-multi.hmziq.xyz/blog/v070-no-more-pinned-claude/</guid><description>claude-multi no longer bundles its own copy of Claude Code. All instances now use your globally installed claude binary, which means they auto-update together. Doctor fix, health checks, and migrations were rewritten. Four pre-existing bugs were fixed along the way.</description><pubDate>Sat, 30 May 2026 00:00:00 GMT</pubDate><content:encoded># v0.7.0: No More Pinned Claude Binary

Here&apos;s what was happening. claude-multi installed its own private copy of `@anthropic-ai/claude-code` into `~/.claude-multi/bin/` via npm. Every wrapper script hardcoded the path to that copy. The idea was to pin a known-good version so third-party providers wouldn&apos;t break during the v2.1.154 incident.

Meanwhile your global `claude` kept auto-updating by rotating a symlink at `~/.local/share/claude/versions/`. But the wrappers stayed stuck on whatever `claude-multi` had pinned. You&apos;d be on 2.1.158 globally while every `claude-*` alias was still running 2.1.156. Two versions behind. No way to close the gap without changing code and bumping a constant.

That pinned copy is gone. Every wrapper now resolves to whatever `claude` you have in PATH. One binary, one version, auto-updates work.

## What changed

`getClaudePath()` used to have three priorities: env override, pinned binary, global PATH. We ripped out the pinned binary check. Now it goes straight from env override to `which claude` (or `where claude` on Windows).

`getGlobalClaudePath()` and `tryGetGlobalClaudePath()` became identical to `getClaudePath()` and `tryGetClaudePath()` after the removal, so we deleted them. All callers updated.

We removed four things from `version.ts`: `COMPATIBLE_CLAUDE_VERSION`, `isThirdPartyApiBroken()`, `getPinnedBinaryVersion()`, and `installPinnedClaude()`. The pinned path constants `PINNED_BIN_DIR` and `PINNED_CLAUDE_BIN` are gone from `paths.ts`.

53 references across 15 files. Zero left.

## Health checks and doctor fix rewritten

This one was embarrassing. The health check that validates wrapper binary paths used to compare against `PINNED_CLAUDE_BIN`. If a wrapper pointed to anything else, it flagged it as broken. The v0.6.2 migration started using the global path, which meant the health check would immediately flag every migrated wrapper as wrong. Then `doctor fix` would rewrite them back to the pinned binary. The migration and the health check were fighting each other. Every time you ran doctor, it undid what the migration just did.

Both now resolve the current global claude path and compare against that. The check also picked up a Windows `.cmd` regex pattern so it can parse wrapper scripts on both platforms.

`doctor fix` used to install the pinned binary as its first step. Now it just checks that `claude` is available in PATH, tells you which binary it found, and regenerates any stale wrappers.

## Migration fixes

The v0.6.2 migration already called `tryGetGlobalClaudePath()` to point wrappers at the global binary. Since we deleted that function, the migration now calls `tryGetClaudePath()`. Same result, different name.

We added a `console.warn` for the case where claude can&apos;t be found during migration. Previously the wrapper regeneration just silently skipped. Now you actually hear about it.

The v0.6.3 migration had a bug with `TUNABLE_ENV_VARS`. It had a local copy of the set that was missing two entries: `CLAUDE_CODE_AUTO_COMPACT_WINDOW` and `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE`. If you had customized auto-compaction settings, that migration would overwrite them with template defaults. We replaced the duplicate with an import from `constants/env.ts` so the canonical set is the only set.

## Three Windows bugs

Found during the audit, not caused by this change, but fixed in the same pass.

`where claude` output was split on `\n` but Windows uses `\r\n`. The PATH membership check in the `add` command used `lastIndexOf(&apos;/&apos;)` and `split(&apos;:&apos;)` instead of `path.dirname()` and `path.delimiter`. And the health check had no regex for `.cmd` wrapper format.

## Upgrading from v0.6.5

Run `claude-multi doctor fix`. It regenerates all your wrapper scripts to point to your global `claude` binary.

Then delete the old npm install:

```
rm -rf ~/.claude-multi/bin/
```

That&apos;s about 100MB of duplicated Claude Code you don&apos;t need anymore.

Your instances follow whatever version your global `claude` is on now. When Claude Code auto-updates itself, all your aliases pick it up on the next launch.

## Numbers

15 files changed. 53 references removed. 4 bugs fixed. 242 tests passing.

---

Full changelog: [CHANGELOG.md](https://github.com/hmziqrs/claude-multi/blob/master/CHANGELOG.md). Provider reference: [/docs/providers/](/docs/providers/).</content:encoded><category>release</category><category>architecture</category><category>health</category><category>migration</category><category>bugfix</category><author>hmziqrs</author><enclosure url="https://claude-multi.hmziq.xyz/audio/https://raw.githubusercontent.com/hmziqrs/claude-multi/master/audio/v070-no-more-pinned-claude.mp3" length="0" type="audio/mpeg"/></item><item><title>Claude Code v2.1.156 fixes the third-party provider breakage</title><link>https://claude-multi.hmziq.xyz/blog/claude-code-v2156-fixes-third-party-providers/</link><guid isPermaLink="true">https://claude-multi.hmziq.xyz/blog/claude-code-v2156-fixes-third-party-providers/</guid><description>Anthropic shipped v2.1.156, which fixes the HTTP 422 error that broke every non-Anthropic API provider in v2.1.154 and v2.1.155. Here is what changed and what claude-multi users need to do.</description><pubDate>Fri, 29 May 2026 00:00:00 GMT</pubDate><content:encoded># Claude Code v2.1.156 fixes the third-party provider breakage

Yesterday I wrote about how Claude Code v2.1.154 broke every third-party API provider. The bug was in the fallback logic: Claude Code&apos;s retry handler only caught HTTP 400, but providers return HTTP 422 for validation errors. The beta feature that sends `role: &quot;system&quot;` in the messages array would fail, the fallback would not trigger, and your conversation died.

Anthropic shipped v2.1.156 a few hours ago. It fixes this.

## What changed in v2.1.156

The fallback detection now catches both HTTP 400 and 422. When a provider rejects the `mid-conversation-system` beta with either status code, Claude Code removes the beta header and falls back to putting system instructions in `&lt;system-reminder&gt;` blocks inside user messages. This is the same fallback that already worked for first-party API calls that happened to return 400.

Versions 2.1.154 and 2.1.155 still have the bug. Upgrade to 2.1.156 or later.

## What claude-multi users need to do

If you are on claude-multi v0.6.0 or later:

```bash
claude-multi doctor fix
```

This reinstalls the pinned Claude binary at `~/.claude-multi/bin/` to v2.1.156 and updates any wrapper scripts that still point to the old version.

If you are on an older version of claude-multi, update first:

```bash
bun update -g claude-multi
claude-multi doctor fix
```

## Auto-updates are back on

When the breakage hit, we disabled auto-updates in claude-multi to prevent Claude Code from silently updating to a broken version. Now that v2.1.156 fixes the issue, auto-updates are re-enabled. New instances created with provider templates will not have `DISABLE_AUTOUPDATER=1` or `DISABLE_UPDATES=1` injected into their settings.

If you have existing instances with those env vars in their `settings.json`, you can remove them manually or re-apply the provider template. They are harmless to leave in place; they just prevent Claude Code from auto-updating.

## The safety infrastructure stays

We are keeping the version pinning and health check code in place, labeled with `[SAFE PARK]` comments throughout the codebase. If a future Claude Code update breaks third-party providers again, we can reactivate it by flipping a few constants:

- `isThirdPartyApiBroken()` in `src/version.ts` gets an updated version range
- `COMPATIBLE_CLAUDE_VERSION` gets the last working version number
- `PROVIDER_COMMON_ENV` in `src/templates.ts` gets `DISABLE_AUTOUPDATER` and `DISABLE_UPDATES` back
- `autoUpdates` in `src/config.ts` flips back to `false`

The `doctor fix` command and the TUI health screen will pick up the changes automatically. No new code needed.

This is the second time in recent months that a Claude Code update has broken provider compatibility. Keeping the safety code parked and ready seems prudent.

## TL;DR

- v2.1.156 fixes the HTTP 422 bug that broke third-party providers in v2.1.154 and v2.1.155
- Run `claude-multi doctor fix` to update your pinned binary
- Auto-updates are re-enabled for new instances
- The pinning safety code stays parked, ready to reactivate if it happens again</content:encoded><category>Claude Code</category><category>third-party providers</category><category>compatibility</category><category>claude-multi</category><author>hmziqrs</author><enclosure url="https://claude-multi.hmziq.xyz/audio/https://raw.githubusercontent.com/hmziqrs/claude-multi/master/audio/claude-code-v2156-fixes-third-party-providers.mp3" length="0" type="audio/mpeg"/></item><item><title>v0.6.3: Drop the Pinned Binary, Update Every Provider Template</title><link>https://claude-multi.hmziq.xyz/blog/v063-migration-and-provider-updates/</link><guid isPermaLink="true">https://claude-multi.hmziq.xyz/blog/v063-migration-and-provider-updates/</guid><description>Instance migrations now point wrappers at your global claude install instead of a stale pinned binary. Every provider template got correct thinking and output token limits.</description><pubDate>Fri, 29 May 2026 00:00:00 GMT</pubDate><content:encoded># v0.6.3: Drop the Pinned Binary, Update Every Provider Template

Two things in this release: moving away from the pinned Claude Code binary, and bringing every provider template up to date with correct thinking and output token limits.

## No more pinned binary

Every claude-multi wrapper script used to point at a pinned Claude Code binary living at `~/.claude-multi/bin/`. That was the right call during the v2.1.154 incident, when we needed to lock everyone to v2.1.153 to avoid the third-party provider breakage. But the incident is over. Claude Code v2.1.156 fixed it, and pinning a binary nobody remembers to update just means your instances fall behind.

I ran into this myself after the 0.6.2 release. The migration had run, the wrapper templates looked correct, but my instances were still executing v2.1.153 because the pinned binary had never been updated. The wrappers were pointing at the right file. The file was just the wrong version.

Starting with 0.6.3, instance migrations regenerate wrappers to point at your globally installed `claude`, the one `which claude` finds on your PATH. The migration checks whether the wrapper content actually differs before writing anything, so instances that are already correct don&apos;t get touched.

If you set `CLAUDE_MULTI_CLAUDE_PATH`, that still takes priority over everything else.

## Provider templates: thinking and output tokens

Every provider template now has explicit `MAX_THINKING_TOKENS`, `MAX_OUTPUT_TOKENS`, `ENABLE_THINKING`, and `REASONING_EFFORT` values matched to what each model actually supports. Before this, several templates left thinking disabled or used generic limits that didn&apos;t line up with the model&apos;s capabilities.

| Provider | Thinking tokens | Output tokens |
|---|---|---|
| MiniMax M2.7 | 32K | 64K |
| DeepSeek V4-Pro | 32K | 128K |
| MiMo V2.5-Pro (both plans) | enabled | 128K |
| Kimi K2.5 | 16K | 64K |
| Qwen3-Coder (both plans) | 16K | 64K |

## Model name fixes

Two model references changed:

- **MiMo** (pay-per-token and Token Plan): bumped from `mimo-v2.5-pro` to `mimo-v2.5-pro[1m]` and `mimo-v2.5` to `mimo-v2.5[1m]`. The `[1m]` variant gives you the full 1M-token context window instead of the default 128K.
- **Kimi**: opus model corrected from `kimi-k2.6` to `kimi-k2.5`. The K2.6 opus tier doesn&apos;t exist on the Anthropic-compatible endpoint. K2.5 is the right model.

If you created a MiMo or Kimi instance before this update, your `settings.json` still has the old model names. You can either run the instance migration (which updates the wrapper script) or edit the model fields by hand.

## Running the migration

```bash
claude-multi doctor check   # see what needs updating
claude-multi doctor fix     # apply fixes
```

Or from the TUI: press `!` to open the health screen, then `f` to fix.

## Under the hood

The migration system gained two new functions: `getGlobalClaudePath()` and `tryGetGlobalClaudePath()` in `wrapper.ts`. These resolve the claude binary from the env override or PATH, without checking the pinned binary. The instance migration uses these to generate wrappers that point at whatever `claude` you have installed.

The old `getClaudePath()` function still exists and still checks the pinned binary first. It&apos;s used when creating new instances. The health check&apos;s `fixWrapperVersions` also still targets the pinned binary for its &quot;safe park&quot; behavior. These are separate flows from the migration, and they work differently on purpose.

---

Full provider reference with model mappings, endpoint URLs, and plan notes is at [/docs/providers/](/docs/providers/). Changelog with all the details is at [CHANGELOG.md](https://github.com/hmziqrs/claude-multi/blob/master/CHANGELOG.md).</content:encoded><category>release</category><category>migration</category><category>providers</category><category>templates</category><author>hmziqrs</author><enclosure url="https://claude-multi.hmziq.xyz/audio/https://raw.githubusercontent.com/hmziqrs/claude-multi/master/audio/v063-migration-and-provider-updates.mp3" length="0" type="audio/mpeg"/></item><item><title>v0.6.4: Existing Instances Now Auto-Sync Provider Template Updates</title><link>https://claude-multi.hmziq.xyz/blog/v064-provider-template-sync/</link><guid isPermaLink="true">https://claude-multi.hmziq.xyz/blog/v064-provider-template-sync/</guid><description>A new instance migration detects which provider each instance uses and re-applies the latest template config (model names, thinking tokens, output limits) to settings.json. API keys and custom env vars are preserved.</description><pubDate>Fri, 29 May 2026 00:00:00 GMT</pubDate><content:encoded># v0.6.4: Existing Instances Now Auto-Sync Provider Template Updates

Here&apos;s the problem. When we update a provider template in claude-multi, the new model names and token limits only affect instances created after the update. Your existing MiMo instance still has `mimo-v2.5-pro` without the `[1m]` context window. Your DeepSeek instance is missing `MAX_THINKING_TOKENS`. Nobody goes back and edits `settings.json` by hand.

v0.6.4 fixes this with a new instance migration that detects which provider each instance uses and re-applies the latest template.

## How it works

The migration reads each instance&apos;s `settings.json`, extracts `ANTHROPIC_BASE_URL`, and matches it against the nine known provider templates (including MiMo Token Plan region variants). If it finds a match, it re-applies the latest template env vars as a spread merge:

- Template values (model names, thinking/output limits) overwrite existing keys
- Your API key survives
- Any env vars you added yourself survive
- If your base URL doesn&apos;t match any template, the migration skips that instance

The first time this runs, it also writes a `providerTemplate` field to the instance metadata in `config.json`. Future migrations use that field instead of re-detecting from the base URL.

## What this means for your instances

If you created an instance before the v0.6.3 template updates, running `claude-multi doctor fix` will now sync these changes:

- **MiMo**: `mimo-v2.5-pro` becomes `mimo-v2.5-pro[1m]` (1M context window)
- **DeepSeek**: gets `MAX_THINKING_TOKENS: &quot;32000&quot;` and `MAX_OUTPUT_TOKENS: &quot;128000&quot;`
- **Kimi**: opus corrected from `kimi-k2.6` to `kimi-k2.5`, gets thinking and output limits
- **MiniMax, Qwen**: get their respective thinking and output token limits

## Running it

```bash
claude-multi doctor check   # see what needs updating
claude-multi doctor fix     # apply fixes
```

Or from the TUI: press `!` to open the health screen, then `f` to fix.

---

Provider reference with model mappings, endpoints, and plan notes: [/docs/providers/](/docs/providers/). Full changelog: [CHANGELOG.md](https://github.com/hmziqrs/claude-multi/blob/master/CHANGELOG.md).</content:encoded><category>release</category><category>migration</category><category>providers</category><category>templates</category><author>hmziqrs</author><enclosure url="https://claude-multi.hmziq.xyz/audio/https://raw.githubusercontent.com/hmziqrs/claude-multi/master/audio/v064-provider-template-sync.mp3" length="0" type="audio/mpeg"/></item><item><title>Claude Code v2.1.154 Broke Every Third-Party Provider. Here&apos;s What Happened.</title><link>https://claude-multi.hmziq.xyz/blog/claude-code-v2154-broke-every-third-party-provider/</link><guid isPermaLink="true">https://claude-multi.hmziq.xyz/blog/claude-code-v2154-broke-every-third-party-provider/</guid><description>On May 28, 2026, Anthropic shipped a Claude Code update that broke every non-Anthropic API provider within hours. A post-mortem on what went wrong, the fallout, and the compatibility safeguards we added to claude-multi.</description><pubDate>Thu, 28 May 2026 00:00:00 GMT</pubDate><content:encoded># Claude Code v2.1.154 Broke Every Third-Party Provider. Here&apos;s What Happened.

On May 28, 2026, Anthropic shipped Claude Code v2.1.154. Within hours, every non-Anthropic API provider stopped working. GLM, DeepSeek, MiniMax, Kimi, Qwen, MiMo -- all of them returned the same error:

```
API Error: 422 {&quot;detail&quot;:[{&quot;type&quot;:&quot;literal_error&quot;,&quot;loc&quot;:[&quot;body&quot;,&quot;messages&quot;,1,&quot;role&quot;],
&quot;msg&quot;:&quot;Input should be &apos;user&apos; or &apos;assistant&apos;&quot;,&quot;input&quot;:&quot;system&quot;,
&quot;ctx&quot;:{&quot;expected&quot;:&quot;&apos;user&apos; or &apos;assistant&apos;&quot;}}]}
```

If you&apos;re using `claude-multi` to run Claude Code against a 3rd-party provider, you probably hit this.

## What changed

Claude Code v2.1.150 introduced a new Anthropic beta: `mid-conversation-system-2026-04-07`. It&apos;s been in every version since. This beta changes how Claude Code sends system instructions to the API.

Before, Claude Code used the top-level `system` parameter for system prompts. Every Anthropic-compatible API supports this:

```json
{
  &quot;model&quot;: &quot;claude-sonnet-4-20250514&quot;,
  &quot;system&quot;: &quot;You are a helpful assistant.&quot;,
  &quot;messages&quot;: [
    {&quot;role&quot;: &quot;user&quot;, &quot;content&quot;: &quot;Hello&quot;}
  ]
}
```

After the beta, Claude Code also sends system instructions as messages with `role: &quot;system&quot;` inside the `messages` array:

```json
{
  &quot;model&quot;: &quot;claude-sonnet-4-20250514&quot;,
  &quot;system&quot;: &quot;You are a helpful assistant.&quot;,
  &quot;messages&quot;: [
    {&quot;role&quot;: &quot;user&quot;, &quot;content&quot;: &quot;Hello&quot;},
    {&quot;role&quot;: &quot;system&quot;, &quot;content&quot;: &quot;Remember: be concise&quot;},
    {&quot;role&quot;: &quot;assistant&quot;, &quot;content&quot;: &quot;Hi! How can I help?&quot;}
  ]
}
```

The beta is opt-in via the `anthropic-beta` header. Claude Code enables it automatically for first-party API calls (api.anthropic.com). For third-party providers, it should be disabled. But the detection logic has a gap.

## Why providers reject it

The Anthropic Messages API spec allows `role: &quot;system&quot;` in the messages array when the beta header is present. Without the header, it&apos;s not valid. Most third-party providers implement the base spec without beta features. Their validators reject any role that isn&apos;t `&quot;user&quot;` or `&quot;assistant&quot;`, and return 422.

## Claude Code has a fallback -- but it&apos;s incomplete

Anthropic anticipated this. Claude Code has built-in retry logic that detects when a provider rejects the `mid-conversation-system` beta. When it catches the error, it removes the beta header and falls back to putting system instructions inside `&lt;system-reminder&gt;` blocks in user messages.

The detection function (minified, from the binary):

```javascript
function pP8(error) {
  if (!Yy) return false;
  if (!(error instanceof APIError) || error.status !== 400) return false;
  // ... pattern matching on error message
}
```

`error.status !== 400`. The fallback only triggers on HTTP 400. Third-party providers return HTTP 422 for validation errors. The retry logic never fires.

This is the bug. The beta feature itself is fine for first-party API. The error handling checks for one specific status code when providers use a different one for the same error.

## What actually happens

1. User runs `claude-glm` (or any claude-multi instance)
2. Claude Code detects it&apos;s using a third-party API
3. The `mid-conversation-system` beta should be disabled, but the detection has edge cases
4. Claude Code sends a request with `role: &quot;system&quot;` in the messages array
5. Provider returns HTTP 422
6. Claude Code&apos;s fallback checks `error.status !== 400` -- doesn&apos;t match 422
7. Error surfaces to the user
8. Conversation is dead

If the provider had returned 400 instead of 422, the fallback would have kicked in.

## How we fixed it in claude-multi

Three layers.

### Layer 1: Pin the Claude Code version

We maintain a pinned Claude Code installation at `~/.claude-multi/bin/`. All claude-multi instances use this instead of the global `claude` binary. Currently pinned to v2.1.153, the last version before the breakage.

```bash
npm install --prefix ~/.claude-multi/bin @anthropic-ai/claude-code@2.1.153
```

The `getClaudePath()` function checks this path first, before falling back to `which claude`.

### Layer 2: Disable auto-updates

We added `DISABLE_AUTOUPDATER=1` and `DISABLE_UPDATES=1` to every provider template&apos;s environment variables. Claude Code won&apos;t auto-update to a broken version.

From `src/templates.ts`:

```typescript
const PROVIDER_COMMON_ENV: Record&lt;string, string&gt; = {
  DISABLE_AUTOUPDATER: &quot;1&quot;,
  DISABLE_UPDATES: &quot;1&quot;,
};
```

These get merged into every instance&apos;s `settings.json` when created via a provider template.

### Layer 3: Doctor fix for existing instances

For instances created before the fix, we added a health check and repair system.

The health check detects two wrapper formats:

1. **Shell format** (current): `exec &quot;/path/to/claude&quot; &quot;$@&quot;`
2. **Node.js format** (legacy): `spawn(&quot;/path/to/claude&quot;, ...)`

Both get flagged as version issues. The fix regenerates wrappers as clean shell scripts pointing to the pinned binary.

```bash
# CLI
claude-multi doctor check   # show issues
claude-multi doctor fix     # auto-fix

# TUI
claude-multi  # press ! for health screen, then f to fix
```

The TUI shows a banner when version issues are detected:

```
⚠ 1 error, 2 warnings — press ! to review
  ⚠ Some instances use a Claude version incompatible with 3rd-party APIs. Press ! to auto-fix.
```

## What providers should do

Return HTTP 400, not 422, for message validation errors. Claude Code&apos;s existing fallback handles 400. This is the fastest path to compatibility without waiting for a Claude Code patch.

## What Anthropic should do

Two things:

1. Broaden the status code check in the fallback logic. `error.status &gt;= 400 &amp;&amp; error.status &lt; 500` instead of `error.status !== 400`.
2. Provide an env var to disable specific betas. `CLAUDE_CODE_DISABLE_MID_CONVERSATION_SYSTEM=1` would let users opt out without downgrading.

## The broader issue

This is the second time in recent months that a Claude Code update has broken third-party provider compatibility. The first was a change in how model names are validated. Anthropic tests against their own API, and third-party compatibility is incidental.

If you rely on third-party providers, pin your Claude Code version and disable auto-updates.

## TL;DR

- Claude Code v2.1.150+ sends `role: &quot;system&quot;` in the messages array (new beta feature)
- Third-party providers reject it with HTTP 422
- Claude Code&apos;s fallback only handles HTTP 400
- Fix: pin to v2.1.153, disable auto-updates, run `claude-multi doctor fix`</content:encoded><category>Claude Code</category><category>post-mortem</category><category>third-party providers</category><category>compatibility</category><category>claude-multi</category><author>hmziqrs</author><enclosure url="https://claude-multi.hmziq.xyz/audio/https://raw.githubusercontent.com/hmziqrs/claude-multi/master/audio/claude-code-v2154-broke-every-third-party-provider.mp3" length="0" type="audio/mpeg"/></item><item><title>Claude Code is doing more of the job now, and claude-multi makes the rest of it cheaper</title><link>https://claude-multi.hmziq.xyz/blog/claude-code-co-engineer-and-claude-multi/</link><guid isPermaLink="true">https://claude-multi.hmziq.xyz/blog/claude-code-co-engineer-and-claude-multi/</guid><description>Claude Code has moved past autocomplete. With MCP and a million-token context, it handles real workflows end to end. claude-multi is how you point it at different providers without burning your config to the ground.</description><pubDate>Wed, 27 May 2026 00:00:00 GMT</pubDate><content:encoded>The thing that has actually changed about AI coding assistants in the last year is not the suggestion quality. It is the scope of what you can hand off in one shot.

Two years ago you were tab-completing functions. Now you can say &quot;fix the race condition in the auth service&quot; and walk away. Claude Code reads the files, plans the change, writes the code, runs the tests, and comes back when it&apos;s done or when it&apos;s stuck. Sometimes the answer is wrong. But it&apos;s wrong about a real attempt at the whole problem, which is a different kind of wrong than &quot;I generated a function that compiles.&quot;

`claude-multi` does not change any of that. What it does is let you point that same workflow at a different provider, without rewriting your config every time you do it.

### What&apos;s actually different about Claude Code in 2026

Three things, mostly.

**The agent loop**. Claude Code does not just write code. It runs the build, reads the failure, edits the file, runs the build again. Most of the value lands in this loop, because most of what makes code work is not the first attempt.

**MCP**. Model Context Protocol is the open standard from Anthropic that lets the model talk to your tools. Jira, GitHub, Slack, Sentry, your database, Figma. Once a server is configured, you can say &quot;implement the fix, open the PR, update the ticket&quot; and the model coordinates across those systems in one conversation. The integration is what makes the agent loop useful past the file you&apos;re editing.

**A 1M-token context window**. Most competitors are still at 200K. This sounds like a spec-sheet number until you watch the model fail at a multi-file refactor because half the project fell out of context. With 1M you can fit the surrounding code, the ticket, the design doc, and the prior PRs, and the model can actually reason about the whole thing.

The combined effect is a real shift in what one engineer can ship per day. The senior engineer is not writing less code. They are writing less coordination boilerplate.

### Where claude-multi fits

The provider landscape is messier than Anthropic alone. There are cheap models that handle most tasks fine, premium models you want for the hard ones, and a few specialized ones that are weirdly good at a specific thing. You probably want access to several of them without your `~/.claude` directory turning into a graveyard.

A few specifics on what claude-multi does for that.

**Switching without editing settings.json**. Each provider gets its own alias and its own config directory. `claude-glm` for GLM, `claude-deepseek` for DeepSeek, `claude-mimo` for MiMo. You pick from a template, paste a key, that&apos;s it.

**The plan-split problem**. Some providers run their pay-per-token API on a different base URL from their subscription coding plan. MiMo does this. Qwen does this. claude-multi has separate templates for each (`mimo` vs `mimo-token`, `qwen` vs `qwen-coding`) so the right key hits the right endpoint.

**Routing**. If you wire in an MCP server like `claude-code-llm-router`, claude-multi instances become the substrate it routes across. Cheap models for small edits and lookups, premium models for the parts that need them. The rough number people quote is 60 to 80 percent cost reduction versus running everything on the top-tier model. Your mileage will vary, but the direction is real.

**TUI or CLI**. Both work. The TUI is faster the first time. The CLI is faster once you know what you want.

### Putting it together

Claude Code is doing more of the job. claude-multi is how you do that job across whichever provider is the right call for the task in front of you, without spending half your week on config plumbing.

That&apos;s the whole pitch.</content:encoded><category>Claude Code</category><category>AI engineering</category><category>LLM integration</category><category>claude-multi</category><category>developer tools</category><category>AI coding</category><category>workflow automation</category><category>MCP</category><author>hmziqrs</author><enclosure url="https://claude-multi.hmziq.xyz/audio/https://raw.githubusercontent.com/hmziqrs/claude-multi/master/audio/claude-code-co-engineer-and-claude-multi.mp3" length="0" type="audio/mpeg"/></item><item><title>How MCP lets Claude Code actually do the rest of your job</title><link>https://claude-multi.hmziq.xyz/blog/claude-code-mcp-workflow-automation/</link><guid isPermaLink="true">https://claude-multi.hmziq.xyz/blog/claude-code-mcp-workflow-automation/</guid><description>MCP gives Claude Code a way to talk to the tools you already use: Jira, GitHub, Slack, your databases. Here is what that buys you and where it breaks down.</description><pubDate>Wed, 27 May 2026 00:00:00 GMT</pubDate><content:encoded>Most of a developer&apos;s day is not coding. It is reading a ticket, finding the branch, running the tests, opening the PR, pasting the link into Slack, going back to the ticket to update the status. Each step is small. The total is not.

Claude Code&apos;s Model Context Protocol (MCP) is the part that lets one prompt do all of that, instead of you doing it.

### What MCP actually is

MCP is an open standard from Anthropic for letting a model talk to external systems through a single interface. Each MCP server is a small bridge that exposes one tool&apos;s capabilities in a format the model can call. You point Claude Code at the servers you want, and from then on it can read Jira issues, open GitHub PRs, post to Slack, run SQL queries against your warehouse, and so on, inside the same conversation.

The point is not that any one of these is hard to wire up. It is that you wire it up once and stop wiring it up.

### What it looks like in practice

A real example. You say:

&gt; Fix the auth bug from JIRA-1234, open a PR, ping the team channel.

If you have the relevant MCP servers configured, Claude Code can read the ticket, pull the affected files, write the fix, run the tests, push a branch, open the PR with the ticket linked, update the ticket status, and post a Slack message. Some of those steps it does well. Some of them you will want to review before approving. Either way it is one conversation, not seven tools.

A few other things it is good at once MCP is in place:

* Code review with context. Pulling the original ticket, the design doc, and the diff into the same review pass changes what the model can spot. Most &quot;missed it in review&quot; bugs are missed because the reviewer did not have the surrounding context, not because they could not read the diff.
* Triage. Reading open issues, grouping them by label or area, suggesting which ones look like duplicates. You still own the call, but the first pass is free.
* Reacting to events. An MCP server can push messages into a session, so the model can act on a webhook, a Telegram message, a Discord ping, without you re-prompting.

### Where it falls down

A few honest caveats.

* MCP is only as good as the servers you connect. A flaky Jira server gives you flaky Jira behavior. Pick servers you trust, or write your own.
* The model still hallucinates calls sometimes. Tool definitions help, but it can still try to call something that does not exist or pass a malformed argument. Tests and reviews are not optional.
* Permissions are a real problem. An agent with write access to your repo, your tracker, and your team chat is an agent that can do real damage if you point it at the wrong thing. Start read-only.

### Why it matters anyway

A 1M-token context window plus MCP changes what a single conversation can hold. Instead of &quot;here is one file, write a function,&quot; you get &quot;here is the ticket, the surrounding code, the last three related PRs, the deploy logs, fix it.&quot; That is a different kind of help than autocomplete.

It is not magic. You still review the diff. But the part where you tab through five browser windows to figure out what to do next, that part shrinks.

---

### References

*   OrbilonTech: Claude Code as Co-Engineer 2026: Powerful Reasons It Wins: [https://orbilontech.com/claude-code-as-co-engineer-2026/](https://orbilontech.com/claude-code-as-co-engineer-2026/)
*   Claude Help Center: Use Claude for Microsoft 365 with third-party platforms: [https://support.claude.com/en/articles/13945233-use-claude-for-microsoft-365-with-third-party-platforms](https://support.claude.com/en/articles/13945233-use-claude-for-microsoft-365-with-third-party-platforms)
*   PRABHAT.DEV: Claude Code: Zero to Hero - The Complete 2026 Field Guide: [https://prabhat.dev/claude-code-zero-to-hero-the-complete-2026-field-guide/](https://prabhat.dev/claude-code-zero-to-hero-the-complete-2026-field-guide/)</content:encoded><category>Claude Code</category><category>Model Context Protocol</category><category>MCP</category><category>workflow automation</category><category>AI co-engineer</category><category>developer productivity</category><category>tool integration</category><category>AI in software development</category><author>hmziqrs</author><enclosure url="https://claude-multi.hmziq.xyz/audio/https://raw.githubusercontent.com/hmziqrs/claude-multi/master/audio/claude-code-mcp-workflow-automation.mp3" length="0" type="audio/mpeg"/></item><item><title>Five New Provider Templates: MiMo, Kimi, Qwen, and More</title><link>https://claude-multi.hmziq.xyz/blog/five-new-provider-templates/</link><guid isPermaLink="true">https://claude-multi.hmziq.xyz/blog/five-new-provider-templates/</guid><description>claude-multi now ships templates for Xiaomi MiMo, Moonshot Kimi, and Alibaba Qwen, including separate templates for providers that split their API across pay-per-token and subscription coding plans.</description><pubDate>Wed, 27 May 2026 00:00:00 GMT</pubDate><content:encoded>When I started claude-multi, the goal was simple: stop editing `settings.json` by hand every time you want to switch providers. GLM, MiniMax, DeepSeek. Those three covered the most common alternatives to Anthropic, and that felt like enough.

Then the past few months happened. Xiaomi shipped MiMo-V2.5-Pro with a 1-trillion parameter MoE at a fraction of Claude&apos;s per-token cost. Moonshot dropped Kimi K2.6 and matched frontier benchmarks on agentic coding. Alibaba&apos;s Qwen3-Coder-Next quietly became the go-to model for a lot of the open-source crowd. The provider landscape got busy fast, and the template list needed to catch up.

So here&apos;s what&apos;s new.

## The five new templates

### Xiaomi MiMo (`mimo`)

MiMo-V2.5-Pro is a 1T MoE model with 42B active parameters and a 1M-token context window. It has a native Anthropic-compatible endpoint, which means zero friction with Claude Code. Swap the URL, paste your key, done.

- **Opus/Sonnet**: `mimo-v2.5-pro`
- **Haiku/fast**: `mimo-v2.5` (310B, 15B active, meaningfully cheaper for background tasks)
- **Endpoint**: `api.xiaomimimo.com/anthropic`

### Xiaomi MiMo Token Plan (`mimo-token`)

MiMo also offers a subscription model called Token Plan: a monthly credit pool rather than pay-per-token billing. The catch: it runs on a different domain. Xiaomi exposes regional endpoints (CN, SG, EU), and the right one comes from your subscription console.

The template ships with the CN endpoint as a placeholder. After setup, edit `~/.claude-&lt;name&gt;/settings.json` and swap `ANTHROPIC_BASE_URL` for whichever regional URL your console shows.

### Moonshot Kimi (`kimi`)

Kimi K2.6 is Moonshot&apos;s open-weight 1T MoE, 32B active, 256K context. Released April 2026. It leads most agentic coding benchmarks while staying well below Claude Opus pricing.

One thing worth knowing: the `kimi-k2-turbo-preview` model was EOL&apos;d on May 25, 2026. There&apos;s no K2.6-turbo yet. So the template uses:

- **Opus**: `kimi-k2.6`
- **Sonnet/Haiku**: `kimi-k2.5` (same model family, ~37% cheaper per token, still active)

Moonshot is strictly pay-per-token. No separate subscription plan, no different URL for different billing tiers.

### Alibaba Qwen (`qwen`)

Qwen3-Coder is the coding-specialized branch of the Qwen3 family. The three-tier model lineup maps cleanly onto Claude Code&apos;s internal model roles:

- **Opus**: `qwen3-coder-next`
- **Sonnet**: `qwen3-coder-plus`
- **Haiku**: `qwen3-coder-flash`

Endpoint: `dashscope-intl.aliyuncs.com/apps/anthropic` (the international DashScope instance).

### Alibaba Qwen Coding Plan (`qwen-coding`)

Alibaba offers a subscription coding plan with dedicated infrastructure: different subdomain, separate quota, subscription-based pricing. If you&apos;re on the coding plan rather than pay-per-token, use this template instead:

- Same models as `qwen`
- Endpoint: `coding-intl.dashscope.aliyuncs.com/apps/anthropic`

---

## The plan-split problem

Adding MiMo and Qwen surfaced something worth explaining: some providers run their pay-per-token API and their coding plan subscription on **completely different base URLs**. This isn&apos;t a minor detail. If you use the wrong URL for your account type, your API key won&apos;t authenticate.

Here&apos;s the full picture across all providers claude-multi supports:

| Provider | Has plan split? | How |
|---|---|---|
| GLM (Z.ai) | Yes | Anthropic endpoint exists **only** for Coding Plan, standard API has no Anthropic URL |
| Xiaomi MiMo | Yes | Different domain per plan (`api.xiaomimimo.com` vs `token-plan-*.xiaomimimo.com`) |
| Alibaba Qwen | Yes | Different subdomain (`dashscope-intl` vs `coding-intl.dashscope`) |
| MiniMax | Partial | Same URL for both; different key type determines which quota is consumed |
| Moonshot Kimi | No | Pay-per-token only, single endpoint |
| DeepSeek | No | Pay-per-token only, single endpoint |

The GLM situation is the most surprising: the Anthropic-compatible URL at `api.z.ai/api/anthropic` **only works if you have a Coding Plan subscription**. Regular pay-per-token GLM users get an OpenAI-compatible API only. That&apos;s why the template is now called &quot;GLM Coding Plan&quot; rather than just &quot;GLM.&quot;

---

## Getting started

**Using the TUI:**

When creating a new instance in the `claude-multi` TUI, navigate to the provider selection menu and choose your desired template. For example, to add Kimi, simply select `kimi` from the list. If you&apos;re on a Token Plan or Coding Plan, select the corresponding `*-token` or `*-coding` variant.

**Using the CLI:**

Alternatively, you can pass the template on the CLI:

```bash
claude-multi add kimi --provider kimi --api-key sk-...
claude-multi add qwen --provider qwen --api-key sk-...
claude-multi add mimo --provider mimo --api-key sk-...
```

For Token Plans or Coding Plans, use the subscription variant:

```bash
claude-multi add mimo --provider mimo-token --api-key tp-...
claude-multi add qwen --provider qwen-coding --api-key sk-...
```

Then just run `claude-kimi`, `claude-qwen`, or `claude-mimo` from any terminal. The provider-specific env vars are already wired into the instance&apos;s `settings.json`, no manual editing required.

---

The full provider reference with model mappings, endpoint URLs, and plan notes is at [/docs/providers/](/docs/providers/).</content:encoded><category>providers</category><category>templates</category><category>announcement</category><author>hmziqrs</author><enclosure url="https://claude-multi.hmziq.xyz/audio/https://raw.githubusercontent.com/hmziqrs/claude-multi/master/audio/five-new-provider-templates.mp3" length="0" type="audio/mpeg"/></item><item><title>Stop paying Opus prices for a git status</title><link>https://claude-multi.hmziq.xyz/blog/llm-cost-optimization-routing/</link><guid isPermaLink="true">https://claude-multi.hmziq.xyz/blog/llm-cost-optimization-routing/</guid><description>If you&apos;re sending every request to a flagship model, you&apos;re overpaying by a lot. A look at how LLM routing actually works, what kind of savings to expect, and how to set it up.</description><pubDate>Wed, 27 May 2026 00:00:00 GMT</pubDate><content:encoded>Running every Claude Code call through Opus is fine if you&apos;re not paying the bill. If you are, it&apos;s the most expensive way to do something a model 50x cheaper would have done correctly.

The fix is routing: classify the request, send it to the cheapest model that can handle it. Boring solution, real savings. The numbers people are quoting from production are 60 to 80 percent off versus all-Opus, with no measurable hit to output quality on the easy stuff.

### The cost shape, in concrete terms

A normal day with Claude Code looks something like:

- Lots of small calls. &quot;Explain this `git` command.&quot; &quot;Rename this variable.&quot; &quot;Add a JSDoc to this function.&quot; Cheap to generate, cheap to verify.
- A medium pile of scoped work. &quot;Write unit tests for this function.&quot; &quot;Refactor this module to use the new logger.&quot; Clear spec, contained blast radius.
- A small number of hard calls. &quot;Why is the auth service deadlocking?&quot; &quot;Design the migration plan for splitting this monolith.&quot; These actually need the smart model.

If you bill all three tiers at flagship rates, the first two are subsidizing the third by a wide margin. The first tier alone is usually 70% of your call volume and almost none of your hard-problem load.

### How a router fits in

A router sits between Claude Code and the providers. Every prompt goes through a tiny classifier first (usually itself a cheap LLM, sometimes a heuristic), which decides what tier the request belongs to. Then it forwards to a model you&apos;ve designated for that tier.

A reasonable default classification is roughly:

- `simple`: typos, renames, formatting, one-liners.
- `medium`: clearly scoped features, tests, single-module refactors.
- `complex`: architecture, cross-subsystem debugging, anything that touches more than a couple of services.

The router (`claude-code-smart-router`, `claude-code-llm-router`, and friends) ships with sensible defaults. You almost always want to tweak them after watching a few days of traffic.

### What the routing chain looks like in practice

Most setups walk through something like:

1. Local model (Ollama with whatever fits your GPU) for the trivial stuff.
2. A cheap remote model (Gemini Flash, Groq Llama, DeepSeek) for the medium tier.
3. A mid-tier model (Sonnet, GPT mid-tier) when the cheap tier isn&apos;t confident.
4. Opus, GPT-5, or whatever your flagship is, only when the classifier picks `complex`.

The &quot;free first&quot; routing is the part that produces the savings. If your Ollama box can handle 30% of your calls for free, that&apos;s 30% of your bill gone before you spend anything on the rest.

### The features that matter once you&apos;re past the demo

A few things to look for in a router beyond the basic tier mapping:

- **Budget caps and downgrades.** When a tier hits its quota, the router falls back to a cheaper model instead of failing the request or silently busting your budget.
- **Caching.** Identical inputs hit the same classification result and, ideally, the same response.
- **Effort knobs.** Some routers can call a model at reduced &quot;thinking&quot; effort for medium work, which cuts the reasoning-token cost meaningfully without dropping output quality on most tasks.
- **Output caps.** Hard limits on response length per tier. Stops a &quot;summarize&quot; call from accidentally generating a novella.

### Where claude-multi fits in

The router needs somewhere to route to. claude-multi gives each provider its own configured instance, so when the router decides &quot;send this to Qwen Flash&quot; or &quot;send this to Kimi&quot; or &quot;send this to local Ollama,&quot; the right base URL, model name, and key are already wired up. You don&apos;t write a multi-provider config from scratch; you point the router at your existing instances.

### How to start

Pick one router (`claude-code-llm-router` is the most-used right now). Drop in a config that defines your providers and a routing strategy per Claude Code task class: `default`, `background`, `think`, `longContext`, `webSearch`. Send `background` tasks to your cheapest model. Send `think` to the smart one. Watch a week of traffic and adjust.

You won&apos;t get the routing right on day one. You&apos;ll get it 80% right, save most of the money, and refine the rest as you go.

---

### References

*   [1] TokenMix Blog: Claude Code Router: Configuration + Troubleshooting 2026: [https://tokenmix.ai/blog/claude-code-router-guide-2026](https://tokenmix.ai/blog/claude-code-router-guide-2026)
*   [2] MostafaGalal1/claude-code-smart-router GitHub: [https://github.com/MostafaGalal1/claude-code-smart-router](https://github.com/MostafaGalal1/claude-code-smart-router)
*   [3] rmb/maestro-router GitHub: [https://github.com/rmb/maestro-router](https://github.com/rmb/maestro-router)</content:encoded><category>LLM cost optimization</category><category>AI development</category><category>LLM routing</category><category>claude-code-llm-router</category><category>tiered models</category><category>cost savings</category><category>AI engineering</category><category>developer tools</category><author>hmziqrs</author><enclosure url="https://claude-multi.hmziq.xyz/audio/https://raw.githubusercontent.com/hmziqrs/claude-multi/master/audio/llm-cost-optimization-routing.mp3" length="0" type="audio/mpeg"/></item><item><title>Three coding models worth paying attention to: MiMo, Kimi, and Qwen</title><link>https://claude-multi.hmziq.xyz/blog/new-llm-frontier-for-coding/</link><guid isPermaLink="true">https://claude-multi.hmziq.xyz/blog/new-llm-frontier-for-coding/</guid><description>Xiaomi MiMo-V2.5-Pro, Moonshot Kimi K2.6, and Alibaba Qwen3-Coder-Next are doing real work now. A look at what each is actually good at, and how to wire them up with claude-multi.</description><pubDate>Wed, 27 May 2026 00:00:00 GMT</pubDate><content:encoded>For a while the choice for serious coding work was Claude or GPT, and everything else was a benchmark chart. That&apos;s not quite true anymore. Three models in particular have crossed the line from &quot;interesting on paper&quot; to &quot;I&apos;d actually use this for work&quot;: Xiaomi&apos;s MiMo-V2.5-Pro, Moonshot&apos;s Kimi K2.6, and Alibaba&apos;s Qwen3-Coder-Next.

None of them replaces Claude Opus across the board. But each of them is genuinely better than the top-tier models at something specific, and they&apos;re cheaper. That makes them worth knowing.

### Xiaomi MiMo-V2.5-Pro

A 1T-parameter Mixture-of-Experts model with 42B active. The headline numbers are:

- 1M-token context window. Big enough that &quot;the whole project&quot; stops being a problem.
- 40-60% fewer tokens than Claude Opus 4.6 or Gemini 3.1 Pro for comparable tasks, on the benchmarks Xiaomi published. Token efficiency translates almost directly to cost.
- Open weights, so you can run it locally if you have the hardware.

Xiaomi&apos;s internal demo had MiMo writing a working compiler in under five hours unattended. That&apos;s not the kind of thing that holds up the moment the network is flaky, but it&apos;s a real number to put against the older &quot;AI tools are autocomplete with extra steps&quot; framing.

Use case: long-context refactors where you want the whole repo in scope and you care about token cost.

### Moonshot Kimi K2.6

Engineered for agentic work specifically. The relevant numbers:

- 12-hour autonomous sessions. Not a marketing number, an architectural one: the runtime is built to keep state coherent across that duration.
- Native primitives for spawning, scheduling, and reconciling up to 300 sub-agents in a swarm. If you&apos;re trying to parallelize work across many sub-tasks, this is the model with the explicit support for it.
- 262K context with auto-compression. Smaller window than MiMo, but compression handles the overflow cleanly.

It leads most of the current agentic benchmarks: SWE-Bench Pro, Terminal-Bench 2.0. If you&apos;re running long-horizon agents over a real codebase, this is the one to try first.

Use case: anything where the agent has to run for hours, manage many sub-tasks, and not lose the thread.

### Alibaba Qwen3-Coder-Next

The dedicated coding branch of the Qwen3 family. What stands out:

- Fine-tuned specifically for coding, which shows up most clearly on small, focused tasks where the general-purpose models still occasionally hallucinate API signatures.
- A tiered lineup that maps cleanly to Anthropic&apos;s: `qwen3-coder-next` for hard problems, `qwen3-coder-plus` for the middle, `qwen3-coder-flash` for cheap small calls.
- Strong open-source adoption, which means more community tooling, more shared prompts, more debuggable behavior.

Use case: high-volume coding work where you want a tier that matches the difficulty of the task.

### What this means in practice

You don&apos;t pick one of these and replace everything. You pick a default and use the others where they&apos;re better. Most of the cost savings people are seeing come from routing: Opus for the hard reasoning, Qwen Flash or DeepSeek for the small edits and lookups, Kimi when you actually need an agent that runs for an afternoon.

The hard part is the plumbing. Each provider has its own base URL, its own model identifiers, sometimes split between pay-per-token and subscription endpoints. That is the part that gets old fast.

### Using them with claude-multi

This is the problem claude-multi exists to solve. It gives each provider its own alias (`claude-mimo`, `claude-kimi`, `claude-qwen`) with its own config directory. You pick a template, paste a key, and you&apos;re done.

A few specifics:

- MiMo and Qwen both have split endpoints for API vs subscription plans. claude-multi has separate templates for each (`mimo` and `mimo-token`, `qwen` and `qwen-coding`) so the right key hits the right endpoint without you reading three docs sites.
- If you wire in a router (the `claude-code-llm-router` MCP server, for example), claude-multi instances become the layer it routes across. Cheap models for small calls, premium models for hard ones, all under the same Claude Code surface.

None of this is exotic. It is just the config plumbing you&apos;d write yourself if you had the time, with the rough edges already filed off.

---

### References

*   Xiaomi MiMo-V2.5-Pro Official Page: [https://mimo.xiaomi.com/mimo-v2-5-pro/](https://mimo.xiaomi.com/mimo-v2-5-pro/)
*   Moonshot Kimi K2.6 - Agentic Coding AI: [https://kimi-k2.org/kimi-k26](https://kimi-k2.org/kimi-k26)
*   The Decoder: Xiaomi&apos;s open-weight MiMo-V2.5-Pro takes aim at Claude Opus with hours-long autonomous coding: [https://the-decoder.com/xiaomis-open-weight-mimo-v2-5-pro-takes-aim-at-claude-opus-with-hours-long-autonomous-coding/](https://the-decoder.com/xiaomis-open-weight-mimo-v2-5-pro-takes-aim-at-claude-opus-with-hours-long-autonomous-coding/)</content:encoded><category>AI coding</category><category>LLM comparison</category><category>Xiaomi MiMo</category><category>Moonshot Kimi</category><category>Alibaba Qwen</category><category>agentic AI</category><category>open-source LLMs</category><category>developer tools</category><category>claude-multi</category><category>workflow automation</category><author>hmziqrs</author><enclosure url="https://claude-multi.hmziq.xyz/audio/https://raw.githubusercontent.com/hmziqrs/claude-multi/master/audio/new-llm-frontier-for-coding.mp3" length="0" type="audio/mpeg"/></item><item><title>Inside claude-multi: a tour of every menu</title><link>https://claude-multi.hmziq.xyz/blog/inside-claude-multi-every-menu/</link><guid isPermaLink="true">https://claude-multi.hmziq.xyz/blog/inside-claude-multi-every-menu/</guid><description>A walkthrough of every screen in the claude-multi TUI. What each option does, when to use it, and the design calls I made along the way.</description><pubDate>Mon, 25 May 2026 00:00:00 GMT</pubDate><content:encoded>I&apos;ve been running Claude Code against several providers for a while. Anthropic for the work I care about, GLM when I want to burn fewer tokens, DeepSeek when I&apos;m curious. For a while this was fine. Then it wasn&apos;t.

The problem was never any one provider. It was the switching. Overwriting `~/.claude/settings.json` between sessions. Forgetting which plugins were installed where. Once I committed a `settings.json` with the wrong base URL and didn&apos;t notice for two days.

That is what claude-multi is. One environment, one workflow, each provider in its own folder. You launch the TUI, pick what you need, you&apos;re done. No subcommands, no flags.

This post is a full walk through every menu item. If you&apos;re trying to figure out whether the thing covers what you need, this should answer that.

## Launching the TUI

You type `claude-multi`. That&apos;s it.

```
🤖 Claude Multi  -  Interactive Mode

2 instance(s): glm, deepseek

  ▸ ➕ Add new instance
    📋 List all instances
    ℹ️  Instance details
    🔌 Manage plugins
    🔄 Toggle auto-sync
    🔗 Re-sync symlinks
    🗑️  Remove instance
    ⚙️  MCP servers
    🚪 Exit
```

Arrow keys to move. `Enter` to select. `ESC` to go back. `q` to quit. Those four keys control everything.

## Adding a new instance

Pick **➕ Add new instance**. The wizard has eight steps. A couple of them have implications worth flagging.

### Step 1: instance name

Something short. `glm`, `deepseek`, `work`, `explore`. Letters, numbers, hyphens, underscores. This name becomes the command on your `PATH`, so `glm` gives you `claude-glm`.

Provider names are the obvious choice. Purpose names like `work` or `cheap` also work and are sometimes easier to remember three months later.

### Step 2: provider template

Pick a template, or `None / Custom` if you want to wire things up yourself.

The templates are not magic. They just write the right env vars (`ANTHROPIC_BASE_URL`, `ANTHROPIC_MODEL`, a few others) into the instance&apos;s `settings.json`. The value is that you don&apos;t have to find the right URLs and model names across four different docs sites.

Current templates: GLM (Z.ai), MiniMax, DeepSeek. More on the way.

### Step 3: API key (if you picked a provider)

Paste the key. It&apos;s masked while you type and written into the instance&apos;s `settings.json`. It does not go into environment variables and it does not end up in your shell history.

If you don&apos;t have the key in front of you, hit `ESC`. You can re-run the wizard later with `None / Custom` and add the key by hand.

### Step 4: confirm paths

Two defaults:

- **Config:** `~/.claude-&lt;name&gt;/`
- **Binary:** `~/.local/bin/claude-&lt;name&gt;`

`y` to accept. The config layout mirrors Claude Code&apos;s own. `~/.local/bin` is the standard user-binary location on Linux and macOS. If you want different paths, you can edit the registry by hand later.

### Step 5: copy options

This is the step that matters most if you already have a `~/.claude` setup.

- **Nothing.** Start clean. No settings, no plugins, no skills. Good for testing.
- **Only `settings.json`.** Carry over your base settings but leave plugins and skills out. Useful when you want the same preferences with a different model.
- **Select plugins.** Cherry-pick. Goes to Step 6.
- **All files.** Settings, `CLAUDE.md`, plugins, skills, the whole thing. Goes to Step 7.

I almost always pick &quot;All files.&quot; Setting up plugins twice is annoying. But if I&apos;m testing something weird, &quot;Nothing&quot; is the right call.

### Step 6: select plugins (if &quot;Select plugins&quot;)

A multi-select list of every plugin in your default `~/.claude`. `space` toggles, `enter` confirms. Only the checked ones get copied.

Useful when you have a few plugins you trust and a bunch you&apos;re still evaluating.

### Step 7: auto-sync (if &quot;All files&quot;)

The wizard asks whether to symlink `plugins/` and `skills/` back to your default `~/.claude/`.

- **`y`.** Those folders become symlinks. Install a plugin once in your default Claude, every instance with auto-sync sees it. One source of truth.
- **`n`.** Those folders are copies. The instance can drift from your default without affecting it. Good for isolated experiments.

You can flip this later. It is not a permanent decision.

### Step 8: done

```
✓ Instance &apos;glm&apos; created successfully!
  ├─ Binary: /Users/you/.local/bin/claude-glm
  └─ Config: /Users/you/.claude-glm
```

Run `claude-glm`. You get a full Claude Code session pointed at the provider you picked, with its own config, history, and plugins. Same `claude` binary, different brain.

## List all instances

**📋 List all instances** shows everything you&apos;ve set up: name, provider, paths, sync status. Useful for remembering what you have. Especially the ones you created six weeks ago and forgot about.

## Instance details

**ℹ️ Instance details** is the deeper view of a single instance: where its files live, which plugins are active, a snapshot of `settings.json`, the auto-sync state. This is the screen I open when something is misbehaving and I want to confirm the state before I touch anything.

## Manage plugins

**🔌 Manage plugins** is where I spend most of my time. Pick an instance, then you get a list of every plugin in its config dir with toggle controls.

From there:

- Enable or disable a plugin without uninstalling it.
- Install a new plugin into this specific instance.
- Copy a plugin from your default `~/.claude/` into this instance by hand.
- Remove a plugin completely.

Auto-sync is the thing to keep in mind. If it&apos;s on, the plugin list is really a view onto your default `~/.claude/` install, so changes propagate. If it&apos;s off, the instance&apos;s plugin set is its own.

claude-multi also runs a collision check on install. If a new plugin would clash with one already there (same name, version mismatch, that kind of thing), the TUI flags it before writing anything.

## Toggle auto-sync

**🔄 Toggle auto-sync** does what it says. Pick an instance, flip the switch. Turning it on rebuilds the symlinks. Turning it off converts them back to plain folders.

I use this when I&apos;ve created an instance without sync and decide later that I want it after all.

## Re-sync symlinks

**🔗 Re-sync symlinks** is the repair tool. If you moved `~/.claude/`, renamed something, or deleted a plugin another instance was pointing at, this rebuilds the broken links. You can run it for one instance or for all of them.

You won&apos;t reach for it often. When something looks wrong, this is the first thing to try.

## Remove instance

**🗑️ Remove instance** cleanly deletes an instance and its wrapper. It asks whether you also want to delete the config directory, with a confirmation either way, so your history isn&apos;t wiped without you noticing.

Also useful if you fat-fingered the name during setup and just want to start over.

## MCP servers

**⚙️ MCP servers** lets you view MCP server configs per instance and copy them between instances. If you put effort into wiring up MCP for your main Claude Code setup, you don&apos;t have to redo that work for every new provider. Pick a source, pick a destination, done.

## Health warnings

claude-multi watches the setup in the background. If something is wrong, the main menu shows a banner: yellow for warnings, red for the things you should fix now. Press `!` to open the health screen, which lists every detected issue (missing binary, broken symlink, stale registry entry, missing config dir) with a suggested fix.

Most of the time it can fix the issue for you automatically.

## A few things that happen without you doing anything

**Path setup.** The first time you create an instance, claude-multi drops the wrapper into `~/.local/bin/`. If that&apos;s not on your `PATH`, the new `claude-&lt;name&gt;` won&apos;t run. The fix is a one-line addition to your shell config:

```bash
export PATH=&quot;$HOME/.local/bin:$PATH&quot;
```

You do this once, not per instance.

**The wrapper script.** Every `claude-&lt;name&gt;` is a tiny script. It sets `CLAUDE_CONFIG_DIR` to the instance&apos;s config directory and then runs the regular `claude` binary. No forking, no patching. When Claude Code updates, every instance gets the update at the same time because they all share the same binary.

**Fallback UI.** If the Ink TUI doesn&apos;t render on your terminal (some SSH setups, some older terminals), there&apos;s a prompt-based UI:

```bash
CLAUDE_MULTI_INK=false claude-multi
```

Same flows, simpler rendering.

## Why the menu, instead of flags

I tried two other versions of this before landing on the menu.

The flag-based one worked but I could never remember the flags. I&apos;d write a command, get a usage error, look up the flags, type it again. Every time.

The guided prompts UI was better but it asked the same questions in the same order whether you were creating a new instance or toggling one plugin. Faster than reading docs, slower than it should have been.

The menu does both. New users can poke around and find every feature without reading anything. Repeat users jump straight to the screen they want. Nobody has to memorize anything.

If you take one thing from this post: spend five minutes inside `claude-multi` and click around. The whole tool is the menu. There are no hidden flags, no secret configs. What you see is what you get.

That simplicity is the whole pitch.</content:encoded><category>claude-code</category><category>tui</category><category>deep-dive</category><author>hmziqrs</author><enclosure url="https://claude-multi.hmziq.xyz/audio/https://raw.githubusercontent.com/hmziqrs/claude-multi/master/audio/inside-claude-multi-every-menu.mp3" length="0" type="audio/mpeg"/></item></channel></rss>