All providers
deepseek

DeepSeek Provider for Claude Code

Frontier coding at per-token pricing with a 1M context

Quick start claude-multi add deepseek

Use cases

General-purpose coding and debugging
Complex algorithm implementation
Codebase exploration and understanding
Cost-effective development workflows

DeepSeek-V4-Pro is a frontier coding model with a 1M token context window and strong performance across benchmarks. DeepSeek-V4-Flash handles fast tasks at a fraction of the cost. Both are accessible through a native Anthropic-compatible endpoint, which means zero friction with Claude Code.

Model specs

RoleModelContextMax Output
Primary (Opus/Sonnet)DeepSeek-V4-Pro1M128K
Fast (Haiku)DeepSeek-V4-Flash1M128K

The template maps V4-Pro to Opus and Sonnet roles for heavy lifting, and V4-Flash to Haiku and small/fast roles for quick tasks. Subagent work also uses V4-Flash, keeping background operations cheap.

Thinking mode is enabled with REASONING_EFFORT: high and 32,000 thinking tokens. Effort level is set to max.

Setup

  1. Create an account at deepseek.com and generate an API key
  2. Run the setup command:
Terminal window
claude-multi add deepseek
  1. Paste your API key when prompted

The template handles the base URL, model mappings, thinking parameters, and output limits.

When to pick DeepSeek

DeepSeek is the default recommendation for developers who want frontier performance without a subscription. The pay-per-token model means you only pay for what you use, and the pricing is competitive across the board.

V4-Pro handles complex coding tasks at the same quality tier as much more expensive models. V4-Flash is fast enough for interactive autocomplete, shell commands, and subagent work where latency matters more than depth.

If you prefer a fixed monthly cost over variable billing, the GLM Coding Plan is the subscription alternative.

Pricing details

DeepSeek charges per token with no minimum commitment. Rates are among the lowest for frontier-class models. Check deepseek.com for current pricing.

  • GLM - similar quality on a subscription plan
  • MiniMax - 1M context, 512K output
  • Qwen - Alibaba’s alternative, pay-per-token
Pricing

Pay-per-token via deepseek.com