Ask Runable forDesign-Driven General AI AgentTry Runable For Free
Runable
Back to Blog
Technology15 min read

DeepSeek-V4 arrives with near state-of-the-art intelligence at 1/6th the cost of Opus 4.7, GPT-5.5 | VentureBeat

DeepSeek's quest to keep frontier AI models open is of benefit to the entire planet of potential AI users, especially enterprises looking to adopt the cuttin...

TechnologyInnovationBest PracticesGuideTutorial
DeepSeek-V4 arrives with near state-of-the-art intelligence at 1/6th the cost of Opus 4.7, GPT-5.5 | VentureBeat
Listen to Article
0:00
0:00
0:00

Deep Seek-V4 arrives with near state-of-the-art intelligence at 1/6th the cost of Opus 4.7, GPT-5.5 | Venture Beat

Overview

Deep Seek-V4 arrives with near state-of-the-art intelligence at 1/6th the cost of Opus 4.7, GPT-5.5

Giant whale breaching in rainbow core with money and code. Credit: Venture Beat made with Open AI Chat GPT Images 2.0

Details

Giant whale breaching in rainbow core with money and code. Credit: Venture Beat made with Open AI Chat GPT Images 2.0

Deep Seek, the Chinese AI startup offshoot of High-Flyer Capital Management quantitative analysis firm, became a near-overnight sensation globally in January 2025 with the release of its open source R1 model that matched proprietary U. S. giants.

It's been an epoch in AI since then, and while Deep Seek has released several updates to that model and its other V3 series, the international AI and business community has been largely waiting with baited breath for the follow-up to the R1 moment.

Now it's arrived with last night's release of Deep Seek-V4, a 1.6-trillion-parameter Mixture-of-Experts (Mo E) model available free under commercially-friendly open source MIT License, which nears — and on some benchmarks, surpasses — the performance of the world’s most advanced closed-source systems at approximately 1/6th the cost over the application programming interface (API).

This release—which Deep Seek AI researcher Deli Chen described on X as a "labor of love" 484 days after the launch of V3—is being hailed as the "second Deep Seek moment".

As Chen noted in his post, "AGI belongs to everyone". It's available now on AI code sharing community Hugging Face and through Deep Seek's API.

Frontier-class AI gets pushed into a lower price band

The most immediate impact of the Deep Seek-V4 launch is economic. The corrected pricing table shows Deep Seek is not pricing its new Pro model at near-zero levels, but it is still pushing high-end model access into a far lower cost tier than the leading U. S. frontier models.

Deep Seek-V4-Pro is priced through its API at

1.74USDper1millioninputtokensonacachemissand1.74 USD per 1 million input tokens on a cache miss and
3.48 per million output tokens.

That puts a simple one-million-input, one-million-output comparison at

5.22.Withcachedinput,theinputpricedropsto5.22. With cached input, the input price drops to
0.145 per million tokens, bringing that same blended comparison down to $3.625.

That is dramatically cheaper than the current premium pricing from Open AI and Anthropic. GPT-5.5 is priced at

5.00permillioninputtokensand5.00 per million input tokens and
30.00 per million output tokens, for a combined $35.00 in the same simple comparison.

Claude Opus 4.7 is priced at

5.00inputand5.00 input and
25.00 output, for a combined $30.00.

On standard, cache-miss pricing, Deep Seek-V4-Pro comes in at roughly one-seventh the cost of GPT-5.5 and about one-sixth (1/6th) the cost of Claude Opus 4.7.

With cached input, the gap widens: Deep Seek-V4-Pro costs about one-tenth as much as GPT-5.5 and about one-eighth as much as Claude Opus 4.7.

The more extreme near-zero story belongs to Deep Seek-V4-Flash, not the Pro model. Flash is priced at

0.14permillioninputtokensonacachemissand0.14 per million input tokens on a cache miss and
0.28 per million output tokens, for a combined $0.42.

With cached input, that drops to $0.308. In that case, Deep Seek’s cheaper model is more than 98% below GPT-5.5 and Claude Opus 4.7 in a simple input-plus-output comparison, or nearly 1/100th the cost — though the performance dips significantly.

Deep Seek is compressing advanced model economics into a much lower band, forcing developers and enterprises to revisit the cost-benefit calculation around premium closed models.

For companies running large inference workloads, that price gap can change what is worth automating. Tasks that look too expensive on GPT-5.5 or Claude Opus 4.7 may become economically viable on Deep Seek-V4-Pro, and even more so on Deep Seek-V4-Flash. The launch does not make intelligence free, but it does make the market harder for premium providers to defend on performance alone.

Benchmarking the frontier: Deep Seek-V4-Pro gets close, but GPT-5.5 and Opus 4.7 still lead on most shared tests

Deep Seek-V4-Pro-Max is best understood as a major open-weight leap, not a clean across-the-board defeat of the newest closed frontier systems.

The model’s strongest benchmark claims come from Deep Seek’s own comparison tables, where it is shown against GPT-5.4 x High, Claude Opus 4.6 Max and Gemini 3.1 Pro High and bests them on several tests, including Codeforces and Apex Shortlist.

But that is not the same as a head-to-head against Open AI’s newer GPT-5.5 or Anthropic’s newer Claude Opus 4.7.

Looking only at Deep Seek-V4 versus the latest proprietary models, the picture is more restrained.

On this shared set, GPT-5.5 and Claude Opus 4.7 still lead most categories.

Deep Seek-V4-Pro-Max’s best showing is on Browse Comp, the benchmark measuring agentic AI web browsing prowess (especially highly containerized information), where it scores 83.4%, narrowly behind GPT-5.5 at 84.4% and ahead of Claude Opus 4.7 at 79.3%.

On Terminal-Bench 2.0, Deep Seek scores 67.9%, close to Claude Opus 4.7’s 69.4%, but far behind GPT-5.5’s 82.7%.

The shared academic-reasoning results favor the closed models: On GPQA Diamond, Deep Seek-V4-Pro-Max scores 90.1%, while GPT-5.5 reaches 93.6% and Claude Opus 4.7 reaches 94.2%.

On Humanity’s Last Exam without tools, Deep Seek scores 37.7%, behind GPT-5.5 at 41.4%, GPT-5.5 Pro at 43.1% and Claude Opus 4.7 at 46.9%. With tools enabled, Deep Seek rises to 48.2%, but still trails GPT-5.5 at 52.2%, GPT-5.5 Pro at 57.2% and Claude Opus 4.7 at 54.7%.

The agentic and software-engineering results are more mixed, but they still show Deep Seek-V4-Pro-Max trailing GPT-5.5 and Opus 4.7.

On Terminal-Bench 2.0, Deep Seek’s 67.9% is competitive with Claude Opus 4.7’s 69.4%, but GPT-5.5 is much higher at 82.7%.

On SWE-Bench Pro, Deep Seek’s 55.4% trails GPT-5.5 at 58.6% and Claude Opus 4.7 at 64.3%. On MCP Atlas, Deep Seek’s 73.6% is slightly behind GPT-5.5 at 75.3% and Claude Opus 4.7 at 79.1%.

Browse Comp is the standout: Deep Seek’s 83.4% beats Claude Opus 4.7’s 79.3% and nearly matches GPT-5.5’s 84.4%, though GPT-5.5 Pro’s 90.1% remains well ahead.

So ultimately, Deep Seek-V4-Pro-Max does not appear to dethrone GPT-5.5 or Claude Opus 4.7 on the benchmarks that can be directly compared across the companies’ published tables. But it gets close enough on several of them — especially Browse Comp, Terminal-Bench 2.0 and MCP Atlas — that its much lower API pricing becomes the headline.

In practical terms, Deep Seek does not need to win every leaderboard row to matter. If it can deliver near-frontier performance on many enterprise-relevant agent and reasoning tasks at roughly one-sixth to one-seventh the standard API cost of GPT-5.5 or Claude Opus 4.7, it still forces a major rethink of the economics of advanced AI deployment.

Deep Seek-V4-Pro-Max is clearly the strongest open-weight model in the field right now, and it is unusually close to frontier closed systems on several practical benchmarks.

While GPT-5.5 and Claude Opus 4.7 still retain the lead in most direct head-to-head comparisons across the company's benchmark charts, Deep Seek V4 Pro gets close while being dramatically cheaper and openly available.

To understand the magnitude of this release, one must look at the performance gains of the base models. Deep Seek-V4-Pro-Base represents a significant advancement over the previous generation, Deep Seek-V3.2-Base. In World Knowledge, V4-Pro-Base achieved 90.1 on MMLU (5-shot) compared to V3.2’s 87.8, and a massive jump on MMLU-Pro from 65.5 to 73.5.

The improvement in high-level reasoning and verified facts is even more pronounced: on Super GPQA, V4-Pro-Base reached 53.9 compared to V3.2's 45.0, and on the FACTS Parametric benchmark, it more than doubled its predecessor's performance, jumping from 27.1 to 62.6. Simple-QA verified scores also saw a dramatic rise from 28.3 to 55.2.

The Long Context capabilities have also been refined. On Long Bench-V2, V4-Pro-Base scored 51.5, significantly outpacing the 40.2 achieved by V3.2-Base. In Code and Math, V4-Pro-Base reached 76.8 on Human Eval (Pass@1), up from 62.8 on V3.2-Base.

These numbers underscore that Deep Seek has not just optimized for inference cost, but has fundamentally improved the intelligence density of its base architecture. The efficiency story is equally compelling for the Flash variant. Deep Seek-V4-Flash-Base, despite utilizing a substantially smaller number of parameters, outperforms the larger V3.2-Base across wide benchmarks, particularly in long-context scenarios.

A new information 'traffic controller,' Manifold-Constrained Hyper-Connections (m HC)

Deep Seek’s ability to offer these prices and performance figures is rooted in radical architectural innovations detailed in its technical report also released today, "Towards Highly Efficient Million-Token Context Intelligence."

The standout technical achievement of V4 is its native one-million-token context window. Historically, maintaining such a large context required massive memory (the key values or KV cache).

Deep Seek solved this by introducing a Hybrid Attention Architecture that combines Compressed Sparse Attention (CSA) to reduce initial token dimensionality and Heavily Compressed Attention (HCA) to aggressively compress the memory footprint for long-range dependencies.

In practice, the V4-Pro model requires only 10% of the KV cache and 27% of the single-token inference FLOPs compared to its predecessor, the Deep Seek-V3.2, even when operating at a 1M token context.

To stabilize a network of 1.6 trillion parameters, Deep Seek moved beyond traditional residual connections. The company's researchers incorporated Manifold-Constrained Hyper-Connections (m HC) to strengthen signal propagation across layers while preserving the model’s expressivity.

m HC allows an AI to have a much wider flow of information (so it can learn more complex things) without the risk of the model becoming unstable or "breaking" during its training. It’s like giving a city a 10-lane highway but adding a perfect AI traffic controller to ensure no one ever hits the brakes.

This is paired with the Muon optimizer, which allowed the team to achieve faster convergence and greater training stability during the pre-training on more than 32T diverse and high-quality tokens.

This pre-training data was refined to remove hatched auto-generated content, mitigating the risk of model collapse and prioritizing unique academic values. The model’s 1.6T parameters utilize a Mixture-of-Experts (Mo E) design where only 49B parameters are activated per token, further driving down compute requirements.

Training the mixture-of-experts (Mo E) to work as a whole

Deep Seek-V4 was not simply trained; it was "cultivated" through a unique two-stage paradigm.

First, through Independent Expert Cultivation, domain-specific experts were trained through Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) using the GRPO (Group Relative Policy Optimization) algorithm. This allowed each expert to master specialized skills like mathematical reasoning or codebase analysis.

First, through Independent Expert Cultivation, domain-specific experts were trained through Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) using the GRPO (Group Relative Policy Optimization) algorithm. This allowed each expert to master specialized skills like mathematical reasoning or codebase analysis.

Second, Unified Model Consolidation integrated these distinct proficiencies into a single model via on-policy distillation, where the unified model acts as the student learning to optimize reverse KL loss with teacher models. This distillation process ensures that the model preserves the specialized capabilities of each expert while operating as a cohesive whole.

Second, Unified Model Consolidation integrated these distinct proficiencies into a single model via on-policy distillation, where the unified model acts as the student learning to optimize reverse KL loss with teacher models. This distillation process ensures that the model preserves the specialized capabilities of each expert while operating as a cohesive whole.

The model’s reasoning capabilities are further segmented into three increasing "effort" modes.

The "Non-think" mode provides fast, intuitive responses for routine tasks.

The "Non-think" mode provides fast, intuitive responses for routine tasks.

"Think High" provides conscious logical analysis for complex problem-solving.

"Think High" provides conscious logical analysis for complex problem-solving.

Finally, "Think Max" pushes the boundaries of model reasoning, bridging the gap with frontier models on complex reasoning and agentic tasks. This flexibility allows users to match the compute effort to the difficulty of the task, further enhancing cost-efficiency.

Finally, "Think Max" pushes the boundaries of model reasoning, bridging the gap with frontier models on complex reasoning and agentic tasks. This flexibility allows users to match the compute effort to the difficulty of the task, further enhancing cost-efficiency.

Breaking the Nvidia GPU stranglehold with local Chinese Huawei Ascend NPUs

While the model weights are the headline, the software stack released alongside them is arguably more important for the future of "Sovereign AI."

Analyst Rui Ma highlighted a single sentence from the release as the most critical: Deep Seek validated their fine-grained Expert Parallelism (EP) scheme on Huawei Ascend NPUs (neural processing units).

By achieving a 1.50x to 1.73x speedup on non-Nvidia GPU platforms, Deep Seek has provided a blueprint for high-performance AI deployment that is resilient to Western GPU supply chains and export controls.

However, it's important to note that Deep Seek still claims it used officially licensed, legal Nvidia GPUs for Deep Seek V4's training, in addition to the Huawei NPUs.

Deep Seek has also open-sourced the Mega Mo E mega-kernel as a component of its Deep GEMM library. This CUDA-based implementation delivers up to a 1.96x speedup for latency-sensitive tasks like RL rollouts and high-speed agent serving.

This move ensures that developers can run these massive models with extreme efficiency on existing hardware, further cementing Deep Seek’s role as the primary driver of open-source AI infrastructure.

The technical report emphasizes that these optimizations are crucial for supporting a standard 1M context across all official services.

Deep Seek-V4 is released under the MIT License, the most permissive framework in the industry. This allows developers to use, copy, modify, and distribute the weights for commercial purposes without royalties—a stark contrast to the "restricted" open-weight licenses favored by other companies.

For local deployment, Deep Seek recommends setting sampling parameters to temperature = 1.0 and top_p = 1.0. For those utilizing the "Think Max" reasoning mode, the team suggests setting the context window to at least 384K tokens to avoid truncating the model's internal reasoning chains.

The release includes a dedicated encoding folder with Python scripts demonstrating how to encode messages in Open AI-compatible format and parse the model's output, including reasoning content.

Deep Seek-V4 is also seamlessly integrated with leading AI agents like Claude Code, Open Claw, and Open Code. This native integration underscores its role as a bedrock for developer tools, providing an open-source alternative to the proprietary ecosystems of major cloud providers.

The community reaction has been one of shock and validation. Hugging Face officially welcomed the "whale" back, stating that the era of cost-effective 1M context length has arrived.

Industry experts noted that the "second Deep Seek moment" has effectively reset the developmental trajectory of the entire field, placing massive pressure on closed-source providers like Open AI and Anthropic to justify their premiums.

AI evaluation firm Vals AI noted that Deep Seek-V4 is now the "#1 open-weight model on our Vibe Code Benchmark, and it’s not close".

Deep Seek is moving quickly to retire its older architectures. The company announced that the legacy deepseek-chat and deepseek-reasoner endpoints will be fully retired on July 24, 2026. All traffic is currently being rerouted to the V4-Flash architecture, signifying a total transition to the million-token standard.

Deep Seek-V4 is more than just a new model; it is a challenge to the status quo. By proving that architectural innovation can substitute for raw compute-maximalism, Deep Seek has made the highest levels of AI intelligence accessible to the global developer community at a far lower cost — something that could benefit the globe, even at a time when lawmakers and leaders in Washington, D. C. are raising concerns about Chinese labs "distilling" from U. S. proprietary giants to train open source models, and fears of said open source or jailbroken proprietary models being used to create weapons and commit terror.

The truth is, while all of these are potential risks — as they were and have been with prior technologies that broadened information access, like search and the internet itself — the benefits seem far outweigh them, and Deep Seek's quest to keep frontier AI models open is of benefit to the entire planet of potential AI users, especially enterprises looking to adopt the cutting-edge at the lowest possible cost.

Deep insights for enterprise AI, data, and security leaders

By submitting your email, you agree to our Terms and Privacy Notice.

Key Takeaways

  • Deep Seek-V4 arrives with near state-of-the-art intelligence at 1/6th the cost of Opus 4

  • Giant whale breaching in rainbow core with money and code

  • Giant whale breaching in rainbow core with money and code

  • Deep Seek, the Chinese AI startup offshoot of High-Flyer Capital Management quantitative analysis firm, became a near-overnight sensation globally in January 2025 with the release of its open source R1 model that matched proprietary U

  • It's been an epoch in AI since then, and while Deep Seek has released several updates to that model and its other V3 series, the international AI and business community has been largely waiting with baited breath for the follow-up to the R1 moment

Cut Costs with Runable

Cost savings are based on average monthly price per user for each app.

Which apps do you use?

Apps to replace

ChatGPTChatGPT
$20 / month
LovableLovable
$25 / month
Gamma AIGamma AI
$25 / month
HiggsFieldHiggsField
$49 / month
Leonardo AILeonardo AI
$12 / month
TOTAL$131 / month

Runable price = $9 / month

Saves $122 / month

Runable can save upto $1464 per year compared to the non-enterprise price of your apps.