Is DeepSeek V4 actually better than Claude Opus 4.6?

On coding-focused benchmarks: yes, slightly (Terminal-Bench, LiveCodeBench, Codeforces). On SWE-Bench Verified: effectively tied (80.6% vs 80.8%). On long-form writing polish and some agent workflows: Opus still wins. On price: V4-Pro is 7x cheaper. It depends what you're doing — but for most coding work, V4-Pro is now the better default.

Can I actually self-host V4-Pro?

Yes. MIT license, weights on Hugging Face. You need serious hardware — 8× H100/H200 GPUs for V4-Pro via vLLM. V4-Flash fits on a single 4× H100 node. Companies like Together and Fireworks will be hosting within days if you don't want to run it yourself.

Is 'preview' going to cost me later?

DeepSeek called V3 a preview too and never renamed it. The pricing on the preview API has held stable historically. The 'preview' tag is effectively 'this is production'.

Does DeepSeek V4 work with tool use / function calling?

Yes. Full OpenAI-compatible function calling. Terminal-Bench 67.9% means tool orchestration is strong — that's a real-world agent benchmark, not just code generation. Works in most agent frameworks out of the box.

← All posts

AI Models3 min read

DeepSeek V4 is out: 1.6T params, 1M context, and an MIT license

DeepSeek shipped V4-Pro and V4-Flash today after four months of delays. The numbers are serious — within 0.2 points of Claude Opus 4.6 on SWE-bench — but the license and pricing are what actually rewrite the market.

April 24, 2026

Share

DeepSeek V4 is out: 1.6T params, 1M context, and an MIT license

DeepSeek released V4-Pro and V4-Flash as preview models on April 24, 2026, under the MIT License. Both models ship on the API today, weights up on Hugging Face, and the vendor lock-in that's defined the frontier for two years just got a lot weaker.

Short version: V4-Pro is 0.2 points behind Claude Opus 4.6 on SWE-bench Verified (80.6% vs 80.8%), 3x cheaper than GPT-5.5, and you can run it on your own GPUs if you want to. That combination didn't exist yesterday.

What actually shipped

Two models, both with 1M-token context:

V4-Pro: 1.6T total params, 49B active per token, pre-trained on 33T tokens. The flagship.
V4-Flash: 284B total, 13B active, 32T tokens. The efficiency tier.

Both use the same new hybrid attention architecture — Compressed Sparse Attention (CSA) stacked with Heavily Compressed Attention (HCA). V4-Pro needs only 27% of the per-token inference FLOPs of V3.2 and 10% of the KV cache. That's the real unlock — the model is an order of magnitude easier to serve than its predecessor for equivalent quality.

The benchmarks that matter

Benchmark	V4-Pro	Claude Opus 4.6
SWE-Bench Verified	80.6%	80.8%
Terminal-Bench 2.0	67.9%	65.4%
LiveCodeBench	93.5%	88.8%
Codeforces rating	3206	—
Putnam-2025 (frontier)	120/120	—

Opus still edges out SWE-Bench by a tenth of a point. DeepSeek wins cleanly on Terminal-Bench, LiveCodeBench, and Codeforces. For competitive coding, V4-Pro is genuinely the new state of the art.

The pricing that actually matters more

V4-Flash: $0.14 input / $0.28 output per million tokens
V4-Pro: $1.74 input / $3.48 output per million tokens

Claude Opus 4.7 is $15 in / $75 output. GPT-5.5 xhigh is $5 / $4.80. At near-identical benchmark performance, V4-Pro is 7x cheaper than Claude and ~1.4x cheaper than GPT-5.5 on output.

Flash is in a different universe: at 13B active and $0.14/M input, it's the new floor for "cheap but good". Gemini 3 Flash is $0.50/M input. Flash just undercuts that by 3.5x while matching most task quality.

Why MIT license matters more than anyone is saying

For two years, frontier-level models have been API-only (OpenAI, Anthropic, Google) or "research-available but not commercial-ready" open-weight (Llama 4's license has usage restrictions). MIT has none of that. You can:

Fine-tune on your data and deploy commercially, no restrictions
Self-host on vLLM, llama.cpp, or any inference stack
Fork, redistribute, ship inside a closed-source product
Evaluate, audit, red-team without vendor NDAs

For enterprise AI, this is the first time "frontier quality" and "full control" have been the same sentence. Expect banks, hospitals, and defense to start asking hard questions about why they're paying Claude's prices when V4-Pro runs on-prem.

The efficiency story is underrated

27% of V3.2's inference FLOPs. 10% of the KV cache. At 1M context. That's not an incremental optimization — it's a re-architecture.

What this means in practice: a single 8xH200 node can serve V4-Pro at interactive latency for multi-tenant workloads that previously needed a cluster. Serving costs drop more than input costs. Inference providers — Together, Fireworks, vLLM self-hosters — get to compete with OpenAI on infra efficiency without Anthropic/OpenAI's cap-ex.

What to watch next

Provider race on Flash. Within a week, expect 3-5 providers hosting V4-Flash with their own pricing below $0.14. Fireworks, Together, DeepInfra all have incentive.
Fine-tune ecosystem. MIT license + strong base = Hugging Face will fill up with domain-tuned V4 variants by the end of May.
Frontier labs react. OpenAI and Anthropic have to choose: drop prices (unlikely in the short term) or ship something that pulls away on benchmarks again.

The take

The story isn't "DeepSeek caught up". It's "the frontier moved from closed-API-only to MIT-licensed in one day". Claude and GPT-5.5 still win specific battles — Opus on long-form writing polish, GPT-5.5 on some agent work — but the floor for "acceptable frontier performance" is now an open model at 1/7th the cost.

For the next version of this blog post on how this specifically changes the economics of AI agents, see DeepSeek V4 for AI agents. For the head-to-head on whether to actually switch from Claude Opus, see the honest comparison.

Try Klaws free for 3 days →

Keep exploring

Your next read

How to Use an AI Agent to Summarize Meetings (and Actually Act on Them)Next

Comparisons

DeepSeek V4 vs Claude Opus 4.6: The Honest Comparison

Related use cases

Developer Assistant

Watches repos, triages issues, and ships PRs.

Research Assistant

Give it a question — come back to a full research brief.

Related integrations

GitHub

Watch repos, triage issues, and ship PRs.

How to Automate Slack with an AI Agent (Without Writing a Bolt App)

Slack bots used to mean Bolt apps, ngrok tunnels, and a server you'd forget to pay for. With an AI agent, you describe the behavior in plain English and Slack just gets a new teammate. Here's the setup and the tasks worth handing off.

Read →

Guides

How to Set Up a Daily AI Briefing (5-Min Setup, Hours Saved Every Week)

Every morning your inbox, calendar, and ten open tabs fight for attention. A daily AI briefing collapses all of that into one message — delivered before you've poured the coffee. Here's how to set yours up and what to put in it.

Read →

Guides

How to Use an AI Agent to Summarize Meetings (and Actually Act on Them)

Most meeting summaries die in a Notion page nobody reopens. With an AI agent, the summary becomes the trigger — action items get assigned, follow-ups scheduled, and the next meeting opens with what was decided last time. Here's the setup.

Read →