Kimi K2 and when 'DeepSeek Moments' become normal

<p>The DeepSeek R1 release earlier this year was more of a prequel than a one-off fluke in the trajectory of AI. Last week, a Chinese startup named Moonshot AI dropped <a href="https://moonshotai.github.io/Kimi-K2/">Kimi K2</a>, an open model that is permissively licensed<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> and competitive with leading frontier models in the U.S. If you're interested in the geopolitics of AI and the rapid dissemination of the technology, this is going to represent another "DeepSeek moment" where much of the Western world — even those who consider themselves up-to-date with happenings of AI — need to change their expectations for the coming years. </p><p>In summary, Kimi K2 shows us that:</p><ul><li><p>HighFlyer, the organization that built DeepSeek, is far from a uniquely capable AI laboratory in China,</p></li><li><p>China is continuing to approach (or reached) the absolute frontier of modeling performance, and</p></li><li><p>The West is falling even further behind on open models.</p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.interconnects.ai/p/kimi-k2-and-when-deepseek-moments?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.interconnects.ai/p/kimi-k2-and-when-deepseek-moments?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p>Kimi K2, described as an "Open-Source Agentic Model" is a sparse mixture of experts (MoE) model<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a> with 1T total parameters (~1.5x DeepSeek V3/R1's 671B) and 32B active parameters (similar to DeepSeek V3/R1's 37B). It is a "non-thinking" model with leading performance numbers in coding and related agentic tasks (earning it many comparisons to Claude 3.5 Sonnet), which means it doesn't generate a long reasoning chain before answering, but it was still trained extensively with reinforcement learning. It clearly outperforms DeepSeek V3 on a variety of benchmarks, including SWE-Bench, LiveCodeBench, AIME, or GPQA, and comes with a base model released as well. It is the new best-available open model by a clear margin.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZKa8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b6bb3ef-ce00-4253-81c5-0c6c30f1f0a4_1920x1080.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZKa8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b6bb3ef-ce00-4253-81c5-0c6c30f1f0a4_1920x1080.jpeg 424w, https://substackcdn.com/image/fetch/$s_!ZKa8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b6bb3ef-ce00-4253-81c5-0c6c30f1f0a4_1920x1080.jpeg 848w, https://substackcdn.com/image/fetch/$s_!ZKa8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b6bb3ef-ce00-4253-81c5-0c6c30f1f0a4_1920x1080.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!ZKa8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b6bb3ef-ce00-4253-81c5-0c6c30f1f0a4_1920x1080.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZKa8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b6bb3ef-ce00-4253-81c5-0c6c30f1f0a4_1920x1080.jpeg" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7b6bb3ef-ce00-4253-81c5-0c6c30f1f0a4_1920x1080.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:215783,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.interconnects.ai/i/168259687?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b6bb3ef-ce00-4253-81c5-0c6c30f1f0a4_1920x1080.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ZKa8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b6bb3ef-ce00-4253-81c5-0c6c30f1f0a4_1920x1080.jpeg 424w, https://substackcdn.com/image/fetch/$s_!ZKa8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b6bb3ef-ce00-4253-81c5-0c6c30f1f0a4_1920x1080.jpeg 848w, https://substackcdn.com/image/fetch/$s_!ZKa8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b6bb3ef-ce00-4253-81c5-0c6c30f1f0a4_1920x1080.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!ZKa8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b6bb3ef-ce00-4253-81c5-0c6c30f1f0a4_1920x1080.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>These facts with the points above all have useful parallels for what comes next:</p><ul><li><p><strong>Controlling who can </strong><em><strong>train</strong></em><strong> cutting edge models is extremely difficult</strong>. More organizations will join this list of OpenAI, Anthropic, Google, Meta, xAI, Qwen, DeepSeek, Moonshot AI, etc. Where there is a concentration of talent and sufficient compute, excellent models are very possible. This is easier to do somewhere such as China or Europe where there is existing talent, but is not restricted to these localities.</p></li><li><p>Kimi K2 was trained on 15.5T tokens and has a very similar number of active parameters as DeepSeek V3/R1, which was trained on 14.8T tokens. <strong>Better models are being trained without substantial increases in compute</strong> — these are referred to as a mix of "algorithmic gains" or "efficiency gains" in training. Compute restrictions will certainly slow this pace of progress on Chinese companies, but they are clearly not a binary on/off bottleneck on training.</p></li><li><p><strong>The gap between the leading open models from the Western research labs versus their Chinese counterparts is only increasing in magnitude</strong>. The best open model from an American company is, maybe, Llama-4-Maverick? Three Chinese organizations have released obviously more useful models with more permissive licenses: DeepSeek, Moonshot AI, and Qwen. A few others such as <a href="https://huggingface.co/tencent/models?sort=likes">Tencent</a>, <a href="https://huggingface.co/MiniMaxAI">Minimax</a>, <a href="https://huggingface.co/THUDM">Z.ai/THUDM</a> may have Llama-4 beat too but are a half step behind the leading Chinese models on some combination of license and performance.<br><br>This comes at the same time that new inference-heavy products are coming online that'll benefit from the potential of cheaper, lower margin hosting options on open models relative to API counterparts (which tend to have high profit margins).</p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.interconnects.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.interconnects.ai/subscribe?"><span>Subscribe now</span></a></p><p>Kimi K2 is set up for a much slower style "DeepSeek Moment" than the DeepSeek R1 model that came out in January of this year because it lacks two culturally salient factors:</p><ol><li><p>DeepSeek R1 was revelatory because it was the first model to expose the reasoning trace to the users, causing massive adoption outside of the technical AI community, and</p></li><li><p>The broader public is already aware that training leading AI models is actually <a href="https://www.interconnects.ai/p/deepseek-v3-and-the-actual-cost-of">very low cost</a> once the technical expertise is built up (recall the DeepSeek V3 $5M training cost number), i.e. the final training run is cheap, so there <em>should</em> be a smaller reaction to similar cheap training cost numbers in the Kimi K2 report coming soon.</p></li></ol><p>Still, as more noise is created around the K2 release (Moonshot releases a technical report soon), this could evolve very rapidly. We've already seen quick experiments spin up <a href="https://x.com/jeremyphoward/status/1944326308210921652">slotting it into the Claude Code application</a> (because Kimi's API is Claude-compatible) and K2 topping many nice "<a href="https://x.com/tri_dao/status/1943745133603610864?s=46">vibe</a> <a href="https://x.com/kalomaze/status/1943711672285139043">tests</a>" or <a href="https://x.com/sam_paech/status/1944276326598553853">creativity benchmarks</a>. There are also tons of fun technical details that I don't have time to go into — from using a relatively unproven optimizer Muon<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a> and scaling up the self-rewarding LLM-as-a-judge pipeline in post-training. A fun tidbit to show how much this matters relative to the noisy Grok 4 release last week is that <a href="https://x.com/OpenRouterAI/status/1944466834167919043">Kimi K2 has already surpassed Grok 4</a> in API usage on the popular OpenRouter platform.</p><p>Later in the day on the 11th, following the K2 release, OpenAI CEO Sam Altman shared the following <a href="https://x.com/sama/status/1943837550369812814">message</a> regarding OpenAI's forthcoming open model (which I previously shared more optimistic thoughts on <a href="https://natolambert.substack.com/p/some-thoughts-on-openai-returning">here</a>) :</p><blockquote><p>we planned to launch our open-weight model next week.</p><p>we are delaying it; we need time to run additional safety tests and review high-risk areas. we are not yet sure how long it will take us.</p><p>while we trust the community will build great things with this model, once weights are out, they can’t be pulled back. this is new for us and we want to get it right.</p><p>sorry to be the bearer of bad news; we are working super hard!</p></blockquote><p>Many attributed this as a reactive move by OpenAI to get out from the shadow of Kimi K2's wonderful release and another DeepSeek media cycle.</p><p>Even though someone at OpenAI shared with me that the rumor that Kimi caused the delay for their open model is very likely not true, this is what being on the back foot looks like. When you're on the back foot, narratives like this are impossible to control.</p><p>We need leaders at the closed AI laboratories in the U.S. to rethink some of the long-term dynamics they're battling with R&amp;D adoption. We need to mobilize funding for <a href="https://www.interconnects.ai/p/the-american-deepseek-project">great, open science projects</a> in the U.S. and Europe. Until then, this is what losing looks like if you want The West to be the long-term foundation of AI research and development. Kimi K2 has shown us that one "DeepSeek Moment" wasn't enough for us to make the changes we need, and hopefully we don't need a third. </p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>The modified MIT <a href="https://huggingface.co/moonshotai/Kimi-K2-Instruct/blob/main/LICENSE">license</a> is somewhat annoying, but technically easy to comply with. These sorts of added terms on marketing make it in conflict with "true open-source principles".</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>Very similar to <a href="https://x.com/rasbt/status/1944056316424577525">DeepSeek architecture</a>.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>Beautiful learning curve.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3UEt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4e86366-eb95-456a-914e-6bca469902ef_1664x1038.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3UEt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4e86366-eb95-456a-914e-6bca469902ef_1664x1038.png 424w, https://substackcdn.com/image/fetch/$s_!3UEt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4e86366-eb95-456a-914e-6bca469902ef_1664x1038.png 848w, https://substackcdn.com/image/fetch/$s_!3UEt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4e86366-eb95-456a-914e-6bca469902ef_1664x1038.png 1272w, https://substackcdn.com/image/fetch/$s_!3UEt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4e86366-eb95-456a-914e-6bca469902ef_1664x1038.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3UEt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4e86366-eb95-456a-914e-6bca469902ef_1664x1038.png" width="1456" height="908" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e4e86366-eb95-456a-914e-6bca469902ef_1664x1038.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:908,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:287226,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.interconnects.ai/i/168259687?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4e86366-eb95-456a-914e-6bca469902ef_1664x1038.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3UEt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4e86366-eb95-456a-914e-6bca469902ef_1664x1038.png 424w, https://substackcdn.com/image/fetch/$s_!3UEt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4e86366-eb95-456a-914e-6bca469902ef_1664x1038.png 848w, https://substackcdn.com/image/fetch/$s_!3UEt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4e86366-eb95-456a-914e-6bca469902ef_1664x1038.png 1272w, https://substackcdn.com/image/fetch/$s_!3UEt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4e86366-eb95-456a-914e-6bca469902ef_1664x1038.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p></div></div>