My 2.5 year old laptop can write Space Invaders in JavaScript now, using GLM-4.5 Air and MLX

<p>I wrote about the new <a href="https://simonwillison.net/2025/Jul/28/glm-45/">GLM-4.5</a> model family yesterday - new open weight (MIT licensed) models from <a href="https://z.ai/">Z.ai</a> in China which their benchmarks claim score highly in coding even against models such as Claude Sonnet 4.</p>

<p>The models are pretty big - the smaller GLM-4.5 Air model is still 106 billion total parameters, which <a href="https://huggingface.co/zai-org/GLM-4.5-Air">is 205.78GB</a> on Hugging Face.</p>

<p>Ivan Fioravanti <a href="https://x.com/ivanfioravanti/status/1949911755028910557">built</a> this <a href="https://huggingface.co/mlx-community/GLM-4.5-Air-3bit">44GB 3bit quantized version for MLX</a>, specifically sized so people with 64GB machines could have a chance of running it. I tried it out... and it works <em>extremely well</em>.</p>

<p>I fed it the following prompt:</p>

<blockquote><p><code>Write an HTML and JavaScript page implementing space invaders</code></p></blockquote>

<p>And it churned away for a while and produced <a href="https://tools.simonwillison.net/space-invaders-GLM-4.5-Air-3bit">the following</a>:</p>



<div style="max-width: 100%; margin-bottom: 0.4em">

    <video controls="controls" preload="none" aria-label="Space Invaders" poster="https://static.simonwillison.net/static/2025/space-invaders.jpg" loop="loop" style="width: 100%; height: auto;" muted="muted">

        <source src="https://static.simonwillison.net/static/2025/space-invaders.mp4" type="video/mp4" />

    </video>

</div>



<p>Clearly this isn't a particularly novel example, but I still think it's noteworthy that a model running on my 2.5 year old laptop (a 64GB MacBook Pro M2) is able to produce code like this - especially code that worked first time with no further edits needed.</p>



<h4 id="how-i-ran-the-model">How I ran the model</h4>



<p>I had to run it using the current <code>main</code> branch of the <a href="https://github.com/ml-explore/mlx-lm">mlx-lm</a> library (to ensure I had <a href="https://github.com/ml-explore/mlx-lm/commit/489e63376b963ac02b3b7223f778dbecc164716b">this commit</a> adding <code>glm4_moe</code> support). I ran that using <a href="https://github.com/astral-sh/uv">uv</a> like this:</p>

<div class="highlight highlight-source-shell"><pre>uv run \

  --with <span class="pl-s"><span class="pl-pds">'</span>https://github.com/ml-explore/mlx-lm/archive/489e63376b963ac02b3b7223f778dbecc164716b.zip<span class="pl-pds">'</span></span> \

  python</pre></div>

<p>Then in that Python interpreter I used the standard recipe for running MLX models:</p>

<pre><span class="pl-k">from</span> <span class="pl-s1">mlx_lm</span> <span class="pl-k">import</span> <span class="pl-s1">load</span>, <span class="pl-s1">generate</span>

<span class="pl-s1">model</span>, <span class="pl-s1">tokenizer</span> <span class="pl-c1">=</span> <span class="pl-en">load</span>(<span class="pl-s">"mlx-community/GLM-4.5-Air-3bit"</span>)</pre>

<p>That downloaded 44GB of model weights to my  <code>~/.cache/huggingface/hub/models--mlx-community--GLM-4.5-Air-3bit</code> folder.</p>

<p>Then:</p>

<pre><span class="pl-s1">prompt</span> <span class="pl-c1">=</span> <span class="pl-s">"Write an HTML and JavaScript page implementing space invaders"</span>

<span class="pl-s1">messages</span> <span class="pl-c1">=</span> [{<span class="pl-s">"role"</span>: <span class="pl-s">"user"</span>, <span class="pl-s">"content"</span>: <span class="pl-s1">prompt</span>}]

<span class="pl-s1">prompt</span> <span class="pl-c1">=</span> <span class="pl-s1">tokenizer</span>.<span class="pl-c1">apply_chat_template</span>(

    <span class="pl-s1">messages</span>,

    <span class="pl-s1">add_generation_prompt</span><span class="pl-c1">=</span><span class="pl-c1">True</span>

)

<span class="pl-s1">response</span> <span class="pl-c1">=</span> <span class="pl-en">generate</span>(

    <span class="pl-s1">model</span>, <span class="pl-s1">tokenizer</span>,

    <span class="pl-s1">prompt</span><span class="pl-c1">=</span><span class="pl-s1">prompt</span>,

    <span class="pl-s1">verbose</span><span class="pl-c1">=</span><span class="pl-c1">True</span>,

    <span class="pl-s1">max_tokens</span><span class="pl-c1">=</span><span class="pl-c1">8192</span>

)</pre>

<p>The response started like this:</p>

<blockquote>

<p><code>&lt;think&gt;</code>The user wants me to create a Space Invaders game using HTML, CSS, and JavaScript. I need to create a complete, functional game with the following features:</p>

<ol>

<li>Player spaceship that can move left/right and shoot</li>

<li>Enemy invaders that move in formation and shoot back</li>

<li>Score tracking</li>

<li>Lives/health system</li>

<li>Game over conditions [...]</li>

</ol>

</blockquote>

<p>Followed by the HTML and this debugging output:</p>

<pre><code>Prompt: 14 tokens, 14.095 tokens-per-sec

Generation: 4193 tokens, 25.564 tokens-per-sec

Peak memory: 47.687 GB

</code></pre>

<p>You can see <a href="https://gist.github.com/simonw/9f515c8e32fb791549aeb88304550893#file-space_invaders-txt-L61">the full transcript here</a>, or view <a href="https://github.com/simonw/tools/blob/9e04fd9895fae1aa9ac78b8e62d2833831fe0544/space-invaders-GLM-4.5-Air-3bit.html">the source on GitHub</a>, or <a href="https://tools.simonwillison.net/space-invaders-GLM-4.5-Air-3bit">try it out in your browser</a>.</p>



<h4 id="pelican">A pelican for good measure</h4>



<p>I ran <a href="https://simonwillison.net/tags/pelican-riding-a-bicycle/">my pelican benchmark</a> against the full sized models <a href="https://simonwillison.net/2025/Jul/28/glm-45/">yesterday</a>, but I couldn't resist trying it against this smaller 3bit model. Here's what I got for <code>"Generate an SVG of a pelican riding a bicycle"</code>:</p>



<p><img src="https://static.simonwillison.net/static/2025/glm-4.5-air-3b-pelican.png" alt="Blue background, pelican looks like a cloud with an orange bike, bicycle is recognizable as a bicycle if not quite the right geometry." /></p>



<p>Here's the <a href="https://gist.github.com/simonw/fe428f7cead72ad754f965a81117f5df">transcript for that</a>.</p>



<p>In both cases the model used around 48GB of RAM at peak, leaving me with just 16GB for everything else - I had to quit quite a few apps in order to get the model to run but the speed was pretty good once it got going.</p>



<h4 id="local-coding-models">Local coding models are really good now</h4>



<p>It's interesting how almost every model released in 2025 has specifically targeting coding. That focus has clearly been paying off: these coding models are getting <em>really good</em> now.</p>



<p>Two years ago when I <a href="https://simonwillison.net/2023/Mar/11/llama/">first tried LLaMA</a> I never <em>dreamed</em> that the same laptop I was using then would one day be able to run models with capabilities as strong as what I'm seeing from GLM 4.5 Air - and Mistral 3.2 Small, and Gemma 3, and Qwen 3, and a host of other high quality models that have emerged over the past six months.</p>