GLM Coding Plan Provider for Claude Code
Frontier reasoning models on a fixed monthly plan
claude-multi add glm Use cases
GLM-5.1 is a frontier-class reasoning model accessible through z.ai’s Coding Plan. It exposes a native Anthropic-compatible endpoint, so Claude Code talks to it without adapters or middleware.
Model specs
| Role | Model | Context |
|---|---|---|
| Primary (Opus) | GLM-5.1 | 128K |
| Fast (Sonnet/Haiku) | GLM-5-Turbo | 128K |
Thinking mode is enabled by default. The template sets REASONING_EFFORT to high and allocates 8,000 thinking tokens, which is enough for most code tasks without burning through your context window.
Auto-compaction is tuned for the 128K context. Without it, Claude Code assumes a 200K window for unknown models and never triggers compaction, eventually crashing when the real context fills up. The template sets CLAUDE_CODE_AUTO_COMPACT_WINDOW to 131,072 and compacts at 75% usage.
Setup
- Get a Coding Plan subscription at z.ai
- Copy your API key from the dashboard
- Run the setup command:
claude-multi add glm- Paste your API key when prompted
That is the whole process. The template configures the base URL, model mappings, context limits, and thinking parameters. Your instance is ready immediately.
When to pick GLM
GLM-5.1 is a good fit when you want a fixed monthly cost instead of per-token billing. The Coding Plan gives you a generous allocation of requests, and GLM-5-Turbo handles lighter tasks (quick edits, shell commands, subagent work) at higher speed.
If your workload is bursty and you prefer paying only for what you use, look at the DeepSeek or MiMo pay-per-token templates instead.
Pricing details
GLM uses a Coding Plan subscription model. You pay a flat monthly fee and get an allocation of requests. Check z.ai for current pricing tiers.
Related providers
Coding Plan subscription via z.ai