deepseek

DeepSeek Provider for Claude Code

Frontier coding at per-token pricing with a 1M context

Quick start claude-multi add deepseek

Use cases

General-purpose coding and debugging

Complex algorithm implementation

Codebase exploration and understanding

Cost-effective development workflows

DeepSeek-V4-Pro is a frontier coding model with a 1M token context window and strong performance across benchmarks. DeepSeek-V4-Flash handles fast tasks at a fraction of the cost. Both are accessible through a native Anthropic-compatible endpoint, which means zero friction with Claude Code.

Model specs

Role	Model	Context	Max Output
Primary (Opus/Sonnet)	DeepSeek-V4-Pro	1M	128K
Fast (Haiku)	DeepSeek-V4-Flash	1M	128K

The template maps V4-Pro to Opus and Sonnet roles for heavy lifting, and V4-Flash to Haiku and small/fast roles for quick tasks. Subagent work also uses V4-Flash, keeping background operations cheap.

Thinking mode is enabled with REASONING_EFFORT: high and 32,000 thinking tokens. Effort level is set to max.

Setup

Create an account at deepseek.com and generate an API key
Run the setup command:

claude-multi add deepseek

Paste your API key when prompted

The template handles the base URL, model mappings, thinking parameters, and output limits.

When to pick DeepSeek

DeepSeek is the default recommendation for developers who want frontier performance without a subscription. The pay-per-token model means you only pay for what you use, and the pricing is competitive across the board.

V4-Pro handles complex coding tasks at the same quality tier as much more expensive models. V4-Flash is fast enough for interactive autocomplete, shell commands, and subagent work where latency matters more than depth.

If you prefer a fixed monthly cost over variable billing, the GLM Coding Plan is the subscription alternative.

Pricing details

DeepSeek charges per token with no minimum commitment. Rates are among the lowest for frontier-class models. Check deepseek.com for current pricing.

GLM - similar quality on a subscription plan
MiniMax - 1M context, 512K output
Qwen - Alibaba’s alternative, pay-per-token

Pricing

Pay-per-token via deepseek.com

Related providers

glm Frontier reasoning models on a fixed monthly plan minimax 1M context window with 512K output tokens mimo 1T MoE model at a fraction of frontier pricing