Engineering lead · GLMDeepSeekMiniMax

Compare Providers on the Same Task

The problem

Choosing a provider is guesswork. Marketing pages claim everything is great. You want to run your actual codebase against multiple providers and see real results before you commit.

The fix

Set up instances for every provider you are evaluating, run the same task on each, and compare speed, accuracy, and cost on your real workload.

Stop guessing, start measuring

Provider comparison pages tell you about benchmarks. You care about your codebase. The only way to know which provider handles your code, your style, and your tasks best is to test them yourself.

claude-multi makes this a terminal operation, not a multi-tool ordeal.

Setting up the comparison

Create one instance per provider you want to test:

claude-multi
# Add instance: test-glm    (template: glm, your GLM key)
# Add instance: test-deepseek (template: deepseek, your DeepSeek key)
# Add instance: test-minimax  (template: minimax, your MiniMax key)

Enable auto-sync on all three so they share your plugins and MCP servers. Then open three terminals and run the same prompt:

# Terminal 1
claude-test-glm "find the memory leak in src/cache.ts"

# Terminal 2
claude-test-deepseek "find the memory leak in src/cache.ts"

# Terminal 3
claude-test-minimax "find the memory leak in src/cache.ts"

All three sessions see the same files. All three use the same Claude Code interface. The only variable is the provider.

What to evaluate

Criterion	What to look for
Correctness	Does the answer work? Does it compile?
Depth	Does it find the root cause or just patch the symptom?
Speed	How fast does it respond? Check wall-clock time.
Cost	Compare token counts and pricing for your usage pattern
Context handling	Does it understand your project structure?
Follow-up quality	Ask a clarifying question. Is the follow-up useful?

Running a structured test

For a more rigorous comparison, create a file of test prompts:

# prompts.txt
# 1. "refactor the auth module to use JWTs"
# 2. "write integration tests for the payment flow"
# 3. "find the race condition in worker.ts"

Run each prompt through each provider. Score the results. The data speaks for itself.

Cleaning up

After you pick a winner, remove the test instances:

claude-multi
# Pick: Remove instance
# Select each test instance
# Delete config directories

Or keep them around for spot checks next quarter.