Back to use cases
Engineering lead · GLMDeepSeekMiniMax

Compare Providers on the Same Task

The problem

Choosing a provider is guesswork. Marketing pages claim everything is great. You want to run your actual codebase against multiple providers and see real results before you commit.

The fix

Set up instances for every provider you are evaluating, run the same task on each, and compare speed, accuracy, and cost on your real workload.

Stop guessing, start measuring

Provider comparison pages tell you about benchmarks. You care about your codebase. The only way to know which provider handles your code, your style, and your tasks best is to test them yourself.

claude-multi makes this a terminal operation, not a multi-tool ordeal.

Setting up the comparison

Create one instance per provider you want to test:

Terminal window
claude-multi
# Add instance: test-glm (template: glm, your GLM key)
# Add instance: test-deepseek (template: deepseek, your DeepSeek key)
# Add instance: test-minimax (template: minimax, your MiniMax key)

Enable auto-sync on all three so they share your plugins and MCP servers. Then open three terminals and run the same prompt:

Terminal window
# Terminal 1
claude-test-glm "find the memory leak in src/cache.ts"
# Terminal 2
claude-test-deepseek "find the memory leak in src/cache.ts"
# Terminal 3
claude-test-minimax "find the memory leak in src/cache.ts"

All three sessions see the same files. All three use the same Claude Code interface. The only variable is the provider.

What to evaluate

CriterionWhat to look for
CorrectnessDoes the answer work? Does it compile?
DepthDoes it find the root cause or just patch the symptom?
SpeedHow fast does it respond? Check wall-clock time.
CostCompare token counts and pricing for your usage pattern
Context handlingDoes it understand your project structure?
Follow-up qualityAsk a clarifying question. Is the follow-up useful?

Running a structured test

For a more rigorous comparison, create a file of test prompts:

Terminal window
# prompts.txt
# 1. "refactor the auth module to use JWTs"
# 2. "write integration tests for the payment flow"
# 3. "find the race condition in worker.ts"

Run each prompt through each provider. Score the results. The data speaks for itself.

Cleaning up

After you pick a winner, remove the test instances:

Terminal window
claude-multi
# Pick: Remove instance
# Select each test instance
# Delete config directories

Or keep them around for spot checks next quarter.