Compare Providers on the Same Task
The problem
Choosing a provider is guesswork. Marketing pages claim everything is great. You want to run your actual codebase against multiple providers and see real results before you commit.
The fix
Set up instances for every provider you are evaluating, run the same task on each, and compare speed, accuracy, and cost on your real workload.
Stop guessing, start measuring
Provider comparison pages tell you about benchmarks. You care about your codebase. The only way to know which provider handles your code, your style, and your tasks best is to test them yourself.
claude-multi makes this a terminal operation, not a multi-tool ordeal.
Setting up the comparison
Create one instance per provider you want to test:
claude-multi# Add instance: test-glm (template: glm, your GLM key)# Add instance: test-deepseek (template: deepseek, your DeepSeek key)# Add instance: test-minimax (template: minimax, your MiniMax key)Enable auto-sync on all three so they share your plugins and MCP servers. Then open three terminals and run the same prompt:
# Terminal 1claude-test-glm "find the memory leak in src/cache.ts"
# Terminal 2claude-test-deepseek "find the memory leak in src/cache.ts"
# Terminal 3claude-test-minimax "find the memory leak in src/cache.ts"All three sessions see the same files. All three use the same Claude Code interface. The only variable is the provider.
What to evaluate
| Criterion | What to look for |
|---|---|
| Correctness | Does the answer work? Does it compile? |
| Depth | Does it find the root cause or just patch the symptom? |
| Speed | How fast does it respond? Check wall-clock time. |
| Cost | Compare token counts and pricing for your usage pattern |
| Context handling | Does it understand your project structure? |
| Follow-up quality | Ask a clarifying question. Is the follow-up useful? |
Running a structured test
For a more rigorous comparison, create a file of test prompts:
# prompts.txt# 1. "refactor the auth module to use JWTs"# 2. "write integration tests for the payment flow"# 3. "find the race condition in worker.ts"Run each prompt through each provider. Score the results. The data speaks for itself.
Cleaning up
After you pick a winner, remove the test instances:
claude-multi# Pick: Remove instance# Select each test instance# Delete config directoriesOr keep them around for spot checks next quarter.