Skip to main content
AI Models3 min read

DeepSeek V4 is out: 1.6T params, 1M context, and an MIT license

DeepSeek shipped V4-Pro and V4-Flash today after four months of delays. The numbers are serious — within 0.2 points of Claude Opus 4.6 on SWE-bench — but the license and pricing are what actually rewrite the market.

April 24, 2026
Share
DeepSeek V4 is out: 1.6T params, 1M context, and an MIT license

DeepSeek released V4-Pro and V4-Flash as preview models on April 24, 2026, under the MIT License. Both models ship on the API today, weights up on Hugging Face, and the vendor lock-in that's defined the frontier for two years just got a lot weaker.

Short version: V4-Pro is 0.2 points behind Claude Opus 4.6 on SWE-bench Verified (80.6% vs 80.8%), 3x cheaper than GPT-5.5, and you can run it on your own GPUs if you want to. That combination didn't exist yesterday.

What actually shipped

Two models, both with 1M-token context:

  • V4-Pro: 1.6T total params, 49B active per token, pre-trained on 33T tokens. The flagship.
  • V4-Flash: 284B total, 13B active, 32T tokens. The efficiency tier.

Both use the same new hybrid attention architecture — Compressed Sparse Attention (CSA) stacked with Heavily Compressed Attention (HCA). V4-Pro needs only 27% of the per-token inference FLOPs of V3.2 and 10% of the KV cache. That's the real unlock — the model is an order of magnitude easier to serve than its predecessor for equivalent quality.

The benchmarks that matter

BenchmarkV4-ProClaude Opus 4.6
SWE-Bench Verified80.6%80.8%
Terminal-Bench 2.067.9%65.4%
LiveCodeBench93.5%88.8%
Codeforces rating3206
Putnam-2025 (frontier)120/120

Opus still edges out SWE-Bench by a tenth of a point. DeepSeek wins cleanly on Terminal-Bench, LiveCodeBench, and Codeforces. For competitive coding, V4-Pro is genuinely the new state of the art.

The pricing that actually matters more

  • V4-Flash: $0.14 input / $0.28 output per million tokens
  • V4-Pro: $1.74 input / $3.48 output per million tokens

Claude Opus 4.7 is $15 in / $75 output. GPT-5.5 xhigh is $5 / $4.80. At near-identical benchmark performance, V4-Pro is 7x cheaper than Claude and ~1.4x cheaper than GPT-5.5 on output.

Flash is in a different universe: at 13B active and $0.14/M input, it's the new floor for "cheap but good". Gemini 3 Flash is $0.50/M input. Flash just undercuts that by 3.5x while matching most task quality.

Why MIT license matters more than anyone is saying

For two years, frontier-level models have been API-only (OpenAI, Anthropic, Google) or "research-available but not commercial-ready" open-weight (Llama 4's license has usage restrictions). MIT has none of that. You can:

  • Fine-tune on your data and deploy commercially, no restrictions
  • Self-host on vLLM, llama.cpp, or any inference stack
  • Fork, redistribute, ship inside a closed-source product
  • Evaluate, audit, red-team without vendor NDAs

For enterprise AI, this is the first time "frontier quality" and "full control" have been the same sentence. Expect banks, hospitals, and defense to start asking hard questions about why they're paying Claude's prices when V4-Pro runs on-prem.

The efficiency story is underrated

27% of V3.2's inference FLOPs. 10% of the KV cache. At 1M context. That's not an incremental optimization — it's a re-architecture.

What this means in practice: a single 8xH200 node can serve V4-Pro at interactive latency for multi-tenant workloads that previously needed a cluster. Serving costs drop more than input costs. Inference providers — Together, Fireworks, vLLM self-hosters — get to compete with OpenAI on infra efficiency without Anthropic/OpenAI's cap-ex.

What to watch next

  1. Provider race on Flash. Within a week, expect 3-5 providers hosting V4-Flash with their own pricing below $0.14. Fireworks, Together, DeepInfra all have incentive.
  2. Fine-tune ecosystem. MIT license + strong base = Hugging Face will fill up with domain-tuned V4 variants by the end of May.
  3. Frontier labs react. OpenAI and Anthropic have to choose: drop prices (unlikely in the short term) or ship something that pulls away on benchmarks again.

The take

The story isn't "DeepSeek caught up". It's "the frontier moved from closed-API-only to MIT-licensed in one day". Claude and GPT-5.5 still win specific battles — Opus on long-form writing polish, GPT-5.5 on some agent work — but the floor for "acceptable frontier performance" is now an open model at 1/7th the cost.

For the next version of this blog post on how this specifically changes the economics of AI agents, see DeepSeek V4 for AI agents. For the head-to-head on whether to actually switch from Claude Opus, see the honest comparison.

Try Klaws free for 3 days →