DeepSeek Launches V4 Preview With 1M Context and Agentic Coding Focus

Agentic Coding
LLM
Benchmark
Performance

DeepSeek Launches V4 Preview With 1M Context and Agentic Coding Focus
DeepSeek, a Chinese research lab known for high-efficiency models, released the preview and open weights for DeepSeek-V4. The family includes DeepSeek-V4-Pro, a 1.6-trillion parameter Mixture-of-Experts (MoE) model (an architecture activating only some parameters per token), and a smaller DeepSeek-V4-Flash variant. Both models support a 1M-token context window as standard.

This release mirrors the pattern of preview MoE models from major labs but pushes context efficiency further through DeepSeek Sparse Attention. By compressing tokens and reducing memory overhead, DeepSeek is commoditizing 1M-token memory, which was previously restricted to closed-source frontier models. It also introduces a configurable thinking mode for deep reasoning.

Access both models via the DeepSeek API, which supports OpenAI and Anthropic-compatible formats. The models are optimized for agentic coding and integrated with tools like Claude Code. Note that legacy deepseek-chat and deepseek-reasoner endpoints will be retired on July 24, 2026, as the provider transitions all traffic to V4.

Read the full update →

Frequently asked questions

What is DeepSeek-V4?
DeepSeek-V4 is a family of open-weight Mixture-of-Experts models designed for high-performance reasoning and agentic tasks. It includes a 1.6-trillion parameter Pro version and a 284-billion parameter Flash version. Both models feature a 1-million token context window as the default standard, allowing the AI to process massive amounts of information in a single request.
How does DeepSeek-V4 handle 1M context length efficiently?
The model uses a novel architectural innovation called DeepSeek Sparse Attention, or DSA, combined with token-wise compression. These techniques significantly reduce the computational and memory costs typically associated with long-context processing. This efficiency allows the 1-million token window to be the standard default across all official services without the usual performance or cost penalties.
What are the dual reasoning modes in DeepSeek-V4?
The DeepSeek-V4 API supports both Thinking and Non-Thinking modes. Thinking mode allows the model to generate internal reasoning tokens to work through complex math, coding, or logic problems before providing a final answer. Non-Thinking mode provides faster response times for simpler tasks where deep deliberation is not required, giving users control over speed and accuracy.
Is DeepSeek-V4 open source and how can I access it?
DeepSeek-V4 is an open-weight model, meaning its parameters are publicly available for download. Developers can also access it via the official API, which supports OpenAI ChatCompletions and Anthropic API formats. Users simply need to update their model name to deepseek-v4-pro or deepseek-v4-flash while keeping their existing base URL and integration settings.
When will the old DeepSeek-Chat and DeepSeek-Reasoner models be retired?
DeepSeek will officially retire the deepseek-chat and deepseek-reasoner model endpoints on July 24, 2026, at 15:59 UTC. Until that date, requests to these legacy models are being automatically routed to the new DeepSeek-V4-Flash architecture in either non-thinking or thinking modes to ensure a smooth transition for existing API users.