Personal Learnings← Simon Willison  Library

Simon Willison · Tech & AI

DeepSeek V4 - almost on the frontier, a fraction of the price

TIER 4   2026-04-24

<p>Chinese AI lab DeepSeek's last model release was V3.2 (and V3.2 Speciale) <a href="https://simonwillison.net/2025/Dec/1/deepseek-v32/">last December</a>. They just dropped the first of their hotly anticipated V4 series in the shape of two preview models, <a href="https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro">DeepSeek-V4-Pro</a> and <a href="https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash">DeepSeek-V4-Flash</a>.</p>

<p>Both models are 1 million token context Mixture of Experts. Pro is 1.6T total parameters, 49B active. Flash is 284B total, 13B active. They're using the standard MIT license.</p>

<p>I think this makes DeepSeek-V4-Pro the new largest open weights model. It's larger than Kimi K2.6 (1.1T) and GLM-5.1 (754B) and more than twice the size of DeepSeek V3.2 (685B).</p>

<p>Pro is 865GB on Hugging Face, Flash is 160GB. I'm hoping that a lightly quantized Flash will run on my 128GB M5 MacBook Pro. It's <em>possible</em> the Pro model may run on it if I can stream just the necessary active experts from disk.</p>

<p>For the moment I tried the models out via <a href="https://openrouter.ai/">OpenRouter</a>, using <a href="https://github.com/simonw/llm-openrouter">llm-openrouter</a>:</p>

<pre><code>llm install llm-openrouter

llm openrouter refresh

llm -m openrouter/deepseek/deepseek-v4-pro 'Generate an SVG of a pelican riding a bicycle'

</code></pre>

<p>Here's the pelican <a href="https://gist.github.com/simonw/4a7a9e75b666a58a0cf81495acddf529">for DeepSeek-V4-Flash</a>:</p>

<p><img src="https://static.simonwillison.net/static/2026/deepseek-v4-flash.png" alt="Excellent bicycle - good frame shape, nice chain, even has a reflector on the front wheel. Pelican has a mean looking expression but has its wings on the handlebars and feet on the pedals. Pouch is a little sharp." style="max-width: 100%;" /></p>

<p>And <a href="https://gist.github.com/simonw/9e8dfed68933ab752c9cf27a03250a7c">for DeepSeek-V4-Pro</a>:</p>

<p><img src="https://static.simonwillison.net/static/2026/deepseek-v4-pro.png" alt="Another solid bicycle, albeit the spokes are a little jagged and the frame is compressed a bit. Pelican has gone a bit wrong - it has a VERY large body, only one wing, a weirdly hairy backside and generally loos like it was drown be a different artist from the bicycle." style="max-width: 100%;" /></p>

<p>For comparison, take a look at the pelicans I got from <a href="https://simonwillison.net/2025/Dec/1/deepseek-v32/">DeepSeek V3.2 in December</a>, <a href="https://simonwillison.net/2025/Aug/22/deepseek-31/">V3.1 in August</a>, and <a href="https://simonwillison.net/2025/Mar/24/deepseek/">V3-0324 in March 2025</a>.</p>

<p>So the pelicans are pretty good, but what's really notable here is the <em>cost</em>. DeepSeek V4 is a very, very inexpensive model.</p>

<p>This is <a href="https://api-docs.deepseek.com/quick_start/pricing">DeepSeek's pricing page</a>. They're charging $0.14/million tokens input and $0.28/million tokens output for Flash, and $1.74/million input and $3.48/million output for Pro.</p>

<p>Here's a comparison table with the frontier models from Gemini, OpenAI and Anthropic:</p>

<center>

<table>

<thead>

<tr>

<th>Model</th>

<th>Input ($/M)</th>

<th>Output ($/M)</th>

</tr>

</thead>

<tbody>

<tr>

<td><strong>DeepSeek V4 Flash</strong></td>

<td>$0.14</td>

<td>$0.28</td>

</tr>

<tr>

<td>GPT-5.4 Nano</td>

<td>$0.20</td>

<td>$1.25</td>

</tr>

<tr>

<td>Gemini 3.1 Flash-Lite</td>

<td>$0.25</td>

<td>$1.50</td>

</tr>

<tr>

<td>Gemini 3 Flash Preview</td>

<td>$0.50</td>

<td>$3</td>

</tr>

<tr>

<td>GPT-5.4 Mini</td>

<td>$0.75</td>

<td>$4.50</td>

</tr>

<tr>

<td>Claude Haiku 4.5</td>

<td>$1</td>

<td>$5</td>

</tr>

<tr>

<td><strong>DeepSeek V4 Pro</strong></td>

<td>$1.74</td>

<td>$3.48</td>

</tr>

<tr>

<td>Gemini 3.1 Pro</td>

<td>$2</td>

<td>$12</td>

</tr>

<tr>

<td>GPT-5.4</td>

<td>$2.50</td>

<td>$15</td>

</tr>

<tr>

<td>Claude Sonnet 4.6</td>

<td>$3</td>

<td>$15</td>

</tr>

<tr>

<td>Claude Opus 4.7</td>

<td>$5</td>

<td>$25</td>

</tr>

<tr>

<td>GPT-5.5</td>

<td>$5</td>

<td>$30</td>

</tr>

</tbody>

</table>

</center>

<p>DeepSeek-V4-Flash is the cheapest of the small models, beating even OpenAI's GPT-5.4 Nano. DeepSeek-V4-Pro is the cheapest of the larger frontier models.</p>

<p>This note from <a href="https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash/blob/main/DeepSeek_V4.pdf">the DeepSeek paper</a> helps explain why they can price these models so low - they've focused a great deal on efficiency with this release, especially for longer context prompts:</p>

<blockquote>

<p>In the scenario of 1M-token context, even DeepSeek-V4-Pro, which has a larger number of activated parameters, attains only 27% of the single-token FLOPs (measured in equivalent FP8 FLOPs) and 10% of the KV cache size relative to DeepSeek-V3.2. Furthermore, DeepSeek-V4-Flash, with its smaller number of activated parameters, pushes efficiency even further: in the 1M-token context setting, it achieves only 10% of the single-token FLOPs and 7% of the KV cache size compared with DeepSeek-V3.2.</p>

</blockquote>

<p>DeepSeek's self-reported benchmarks <a href="https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash/blob/main/DeepSeek_V4.pdf">in their paper</a> show their Pro model competitive with those other frontier models, albeit with this note:</p>

<blockquote>

<p>Through the expansion of reasoning tokens, DeepSeek-V4-Pro-Max demonstrates superior performance relative to GPT-5.2 and Gemini-3.0-Pro on standard reasoning benchmarks. Nevertheless, its performance falls marginally short of GPT-5.4 and Gemini-3.1-Pro, suggesting a developmental trajectory that trails state-of-the-art frontier models by approximately 3 to 6 months.</p>

</blockquote>

<p>I'm keeping an eye on <a href="https://huggingface.co/unsloth/models">huggingface.co/unsloth/models</a> as I expect the Unsloth team will have a set of quantized versions out pretty soon. It's going to be very interesting to see how well that Flash model runs on my own machine.</p>