kimi

Moonshot Kimi Provider for Claude Code

Agentic coding with strong tool-use and reasoning

Quick start claude-multi add kimi

Use cases

Agentic coding with complex tool chains

Multi-step debugging and investigation

Interactive pair programming

API integration and glue code

Kimi K2.5 is Moonshot AI’s coding-focused model with strong performance on agentic benchmarks. It excels at multi-step tool use, which is the core of how Claude Code operates. The Anthropic-compatible endpoint at moonshot.ai connects directly to Claude Code without any adapters.

Model specs

Role	Model	Context	Max Output
All roles	Kimi K2.5	128K	65,536

The template maps K2.5 to every role. It handles heavy reasoning and fast tasks equally well.

Thinking mode is enabled with REASONING_EFFORT: high and 16,000 thinking tokens. Auto-compaction is tuned for the 128K context window. Without these settings, Claude Code assumes a 200K window for unrecognized models and never compacts, leading to context overflow crashes.

Setup

Create an account at moonshot.ai and generate an API key
Run the setup command:

claude-multi add kimi

Paste your API key when prompted

The template configures the base URL, model mapping, thinking parameters, context limits, and compaction thresholds.

When to pick Kimi

Kimi is a strong choice for interactive, tool-heavy workflows. If you spend most of your Claude Code time in agentic mode (reading files, running commands, editing code in sequence), K2.5 handles that loop well. It is also competitive on price.

The 128K context window is sufficient for most day-to-day development. If you regularly work with codebases larger than 100K tokens, consider MiniMax or DeepSeek for their 1M windows.

Kimi is pay-per-token only. There is no subscription plan.

Pricing details

Moonshot charges per token with no minimums. Check moonshot.ai for current pricing.

DeepSeek - 1M context, also pay-per-token
MiMo - lower cost per token, 1M context
GLM - subscription alternative

Pricing

Pay-per-token via moonshot.ai

Related providers

glm Frontier reasoning models on a fixed monthly plan minimax 1M context window with 512K output tokens deepseek Frontier coding at per-token pricing with a 1M context