glm

GLM Coding Plan Provider for Claude Code

Frontier reasoning models on a fixed monthly plan

Quick start claude-multi add glm

Use cases

Agentic coding with extended reasoning chains

Day-to-day development on a predictable subscription budget

Multi-file refactoring and architecture work

Code review and test generation

GLM-5.1 is a frontier-class reasoning model accessible through z.ai’s Coding Plan. It exposes a native Anthropic-compatible endpoint, so Claude Code talks to it without adapters or middleware.

Model specs

Role	Model	Context
Primary (Opus)	GLM-5.1	128K
Fast (Sonnet/Haiku)	GLM-5-Turbo	128K

Thinking mode is enabled by default. The template sets REASONING_EFFORT to high and allocates 8,000 thinking tokens, which is enough for most code tasks without burning through your context window.

Auto-compaction is tuned for the 128K context. Without it, Claude Code assumes a 200K window for unknown models and never triggers compaction, eventually crashing when the real context fills up. The template sets CLAUDE_CODE_AUTO_COMPACT_WINDOW to 131,072 and compacts at 75% usage.

Setup

Get a Coding Plan subscription at z.ai
Copy your API key from the dashboard
Run the setup command:

claude-multi add glm

Paste your API key when prompted

That is the whole process. The template configures the base URL, model mappings, context limits, and thinking parameters. Your instance is ready immediately.

When to pick GLM

GLM-5.1 is a good fit when you want a fixed monthly cost instead of per-token billing. The Coding Plan gives you a generous allocation of requests, and GLM-5-Turbo handles lighter tasks (quick edits, shell commands, subagent work) at higher speed.

If your workload is bursty and you prefer paying only for what you use, look at the DeepSeek or MiMo pay-per-token templates instead.

Pricing details

GLM uses a Coding Plan subscription model. You pay a flat monthly fee and get an allocation of requests. Check z.ai for current pricing tiers.

DeepSeek - pay-per-token, also frontier coding
MiniMax - 1M context window, subscription
Qwen - Alibaba’s coder models, pay-per-token

Pricing

Coding Plan subscription via z.ai

Related providers

minimax 1M context window with 512K output tokens deepseek Frontier coding at per-token pricing with a 1M context mimo 1T MoE model at a fraction of frontier pricing