Two hours after DeepSeek V4 shipped, every Twitter feed looked the same: "Claude is dead". That's overclaim. Opus 4.6 is still the best model at several things, including some that matter a lot. But the fact that we have to write a real comparison — instead of "DeepSeek is 80% as good for 10% the price" — is the actual story.
Here's the side-by-side nobody is doing well.
Raw spec sheet
| DeepSeek V4-Pro | Claude Opus 4.6 | |
|---|---|---|
| Params (total / active) | 1.6T / 49B | undisclosed dense |
| Context window | 1M | 200k |
| SWE-Bench Verified | 80.6% | 80.8% |
| Terminal-Bench 2.0 | 67.9% | 65.4% |
| LiveCodeBench | 93.5% | 88.8% |
| MMLU-Pro | 88.1% | 87.2% |
| Long-context retrieval (1M) | 94% | N/A (200k cap) |
| Function calling accuracy | 98.1% | 99.2% |
| Input price / 1M | $1.74 | $15.00 |
| Output price / 1M | $3.48 | $75.00 |
| License | MIT | Closed API |
Where Claude Opus still wins
Long-form writing quality. For blog posts, essays, executive communications, Opus output reads more polished with less editing. V4-Pro is capable but defaults to a more utilitarian style. If you're writing something a human will read start to finish, Opus is still worth the 7x.
Function calling reliability. 99.2% vs 98.1% sounds marginal, but in a long agent loop those fractions compound. After 20 tool calls, Opus is at ~82% "zero errors", V4-Pro at ~67%. For production agent systems where a single schema error wastes a whole run, this matters more than the benchmarks suggest.
Ambiguity handling. When instructions are underspecified, Opus asks clarifying questions more often. V4-Pro has a bias toward attempting an answer even when it shouldn't. For customer-facing agents this can be a real problem.
Ecosystem. Claude has a mature Messages API, built-in prompt caching, computer use, MCP integration. V4-Pro just dropped today. Tooling will catch up but it's not there yet.
Where DeepSeek V4 wins
Price. Not marginally — 7x on input, 22x on output. If you're cost-sensitive or scaling, this is not close.
Coding benchmarks. Terminal-Bench, LiveCodeBench, Codeforces — V4-Pro leads cleanly. For devs using the model as a coding assistant, measurable improvement.
Context window. 1M vs 200k. If you're feeding entire codebases, research paper collections, or long session histories, V4-Pro fits 5x more in one call.
Open license. Nothing to compare here — Opus is closed-API, V4-Pro is MIT. For regulated industries or anyone who needs audit/control, this is binary.
Self-hostability. Run it on your own GPUs or through any provider. No vendor lock-in, no rate limits you didn't choose.
The price math that actually matters
A realistic agent task — 10k input tokens (system + tools + history), 2k output tokens, 70% cache hit on input:
| V4-Pro (no cache yet) | Opus 4.6 (with cache) | |
|---|---|---|
| Input cost | $0.0174 | $0.0045 |
| Output cost | $0.00696 | $0.15 |
| Total / task | $0.024 | $0.155 |
V4-Pro is ~6.4x cheaper per agent task even without prompt caching (which V4 is expected to support shortly). At 1,000 tasks/day, that's $24 vs $155. At 100 users running 10 tasks each per day, $720 vs $4,650 monthly.
When to use each
Use Claude Opus when:
- Writing quality matters more than infra cost
- You're in a regulated industry that already has SOC2 with Anthropic
- You need maximum tool-call reliability in multi-step agent chains
- You need prompt caching at scale today (V4 pricing without caching is still cheaper, but caching is coming)
Use DeepSeek V4 when:
- You're cost-sensitive or trying to scale
- Coding is your primary workload
- You need >200k context
- You want MIT license for compliance/control
- You're building an agent product where output quality is "good enough" but margins matter
Use both: Route by task complexity. Simple/routine → V4-Flash. Main agent → V4-Pro. Fallback for the hardest 5% that V4 struggles on → Opus. This is how most serious agent teams will ship by summer.
My honest take
V4-Pro isn't "catch-up" — on coding specifically it's ahead. On writing and some agent reliability it's behind. The 7x price makes the comparison asymmetric in V4's favor for 80% of workloads.
If you're cost-blind and writing user-facing content, stay on Opus. If you're building anything where cost × scale matters, V4-Pro is your new default, with Opus kept in reserve for the hardest tasks.
For the full launch context, see DeepSeek V4 is out. For the agent-specific angle, see DeepSeek V4 for AI agents.