Simon Willison · Tech & AI
TIER 4 2025-12-11
<p>OpenAI reportedly <a href="https://www.wsj.com/tech/ai/openais-altman-declares-code-red-to-improve-chatgpt-as-google-threatens-ai-lead-7faf5ea6">declared a "code red"</a> on the 1st of December in response to increasingly credible competition from the likes of Google's Gemini 3. It's less than two weeks later and they just <a href="https://openai.com/index/introducing-gpt-5-2/">announced GPT-5.2</a>, calling it "the most capable model series yet for professional knowledge work".</p> <h4 id="key-characteristics-of-gpt-5-2">Key characteristics of GPT-5.2</h4> <p>The new model comes in two variants: GPT-5.2 and GPT-5.2 Pro. There's no Mini variant yet.</p> <p>GPT-5.2 is available via their UI in both "instant" and "thinking" modes, presumably still corresponding to the API concept of different reasoning effort levels.</p> <p>The knowledge cut-off date for both variants is now <strong>August 31st 2025</strong>. This is significant - GPT 5.1 and 5 were both Sep 30, 2024 and GPT-5 mini was May 31, 2024.</p> <p>Both of the 5.2 models have a 400,000 token context window and 128,000 max output tokens - no different from 5.1 or 5.</p> <p>Pricing wise 5.2 is a rare <em>increase</em> - it's 1.4x the cost of GPT 5.1, at $1.75/million input and $14/million output. GPT-5.2 Pro is $21.00/million input and a hefty $168.00/million output, putting it <a href="https://www.llm-prices.com/#sel=gpt-4.5%2Co1-pro%2Cgpt-5.2-pro">up there</a> with their previous most expensive models o1 Pro and GPT-4.5.</p> <p>So far the main benchmark results we have are self-reported by OpenAI. The most interesting ones are a 70.9% score on their GDPval "Knowledge work tasks" benchmark (GPT-5 got 38.8%) and a 52.9% on ARC-AGI-2 (up from 17.6% for GPT-5.1 Thinking).</p> <p>The ARC Prize Twitter account provided <a href="https://x.com/arcprize/status/1999182732845547795">this interesting note</a> on the efficiency gains for GPT-5.2 Pro</p> <blockquote> <p>A year ago, we verified a preview of an unreleased version of @OpenAI o3 (High) that scored 88% on ARC-AGI-1 at est. $4.5k/task</p> <p>Today, we’ve verified a new GPT-5.2 Pro (X-High) SOTA score of 90.5% at $11.64/task</p> <p>This represents a ~390X efficiency improvement in one year</p> </blockquote> <p>GPT-5.2 can be accessed in OpenAI's Codex CLI tool like this:</p> <pre><code>codex -m gpt-5.2 </code></pre> <p>There are three new API models:</p> <ul> <li><a href="https://platform.openai.com/docs/models/gpt-5.2">gpt-5.2</a> - I think this is what you get if you select "GPT-5.2 Thinking" in ChatGPT but <a href="https://twitter.com/simonw/status/1999603339382976785">I'm a little confused</a>.</li> <li> <a href="https://platform.openai.com/docs/models/gpt-5.2-chat-latest">gpt-5.2-chat-latest</a> - the model used by ChatGPT for "GPT-5.2 Instant" mode. It's priced the same as GPT-5.2 but has a reduced 128,000 context window with 16,384 max output tokens.</li> <li><a href="https://platform.openai.com/docs/models/gpt-5.2-pro">gpt-5.2-pro</a></li> </ul> <p>OpenAI have published a new <a href="https://cookbook.openai.com/examples/gpt-5/gpt-5-2_prompting_guide">GPT-5.2 Prompting Guide</a>. An interesting note from that document is that compaction can now be run with <a href="https://platform.openai.com/docs/api-reference/responses/compact">a new dedicated server-side API</a>:</p> <blockquote> <p>For long-running, tool-heavy workflows that exceed the standard context window, GPT-5.2 with Reasoning supports response compaction via the <code>/responses/compact</code> endpoint. Compaction performs a loss-aware compression pass over prior conversation state, returning encrypted, opaque items that preserve task-relevant information while dramatically reducing token footprint. This allows the model to continue reasoning across extended workflows without hitting context limits.</p> </blockquote> <h4 id="it-s-better-at-vision">It's better at vision</h4> <p>One note from the announcement that caught my eye:</p> <blockquote> <p>GPT‑5.2 Thinking is our strongest vision model yet, cutting error rates roughly in half on chart reasoning and software interface understanding.</p> </blockquote> <p>I had <a href="https://simonwillison.net/2025/Aug/29/the-perils-of-vibe-coding/">disappointing results from GPT-5</a> on an OCR task a while ago. I tried it against GPT-5.2 and it did <em>much</em> better:</p> <div class="highlight highlight-source-shell"><pre>llm -m gpt-5.2 ocr -a https://static.simonwillison.net/static/2025/ft.jpeg</pre></div> <p>Here's <a href="https://gist.github.com/simonw/b4a13f1e424e58b8b0aca72ae2c3cb00">the result</a> from that, which cost 1,520 input and 1,022 for a total of <a href="https://www.llm-prices.com/#it=1520&ot=1022&sel=gpt-5.2">1.6968 cents</a>.</p> <h4 id="rendering-some-pelicans">Rendering some pelicans</h4> <p>For my classic "Generate an SVG of a pelican riding a bicycle" test:</p> <div class="highlight highlight-source-shell"><pre>llm -m gpt-5.2 <span class="pl-s"><span class="pl-pds">"</span>Generate an SVG of a pelican riding a bicycle<span class="pl-pds">"</span></span></pre></div> <p><img src="https://static.simonwillison.net/static/2025/gpt-2.5-pelican.png" alt="Described by GPT-5.2: Cartoon-style illustration: A white, duck-like bird with a small black eye, oversized orange beak (with a pale blue highlight along the lower edge), and a pink neckerchief rides a blue-framed bicycle in side view; the bike has two large black wheels with gray spokes, a blue front fork, visible black crank/pedal area, and thin black handlebar lines, with gray motion streaks and a soft gray shadow under the bike on a light-gray road; background is a pale blue sky with a simple yellow sun at upper left and two rounded white clouds (one near upper center-left and one near upper right)." style="max-width: 100%;" /></p> <p>And for the more advanced alternative test, which tests instruction following in a little more depth:</p> <div class="highlight highlight-source-shell"><pre>llm -m gpt-5.2 <span class="pl-s"><span class="pl-pds">"</span>Generate an SVG of a California brown pelican riding a bicycle. The bicycle</span> <span class="pl-s">must have spokes and a correctly shaped bicycle frame. The pelican must have its</span> <span class="pl-s">characteristic large pouch, and there should be a clear indication of feathers.</span> <span class="pl-s">The pelican must be clearly pedaling the bicycle. The image should show the full</span> <span class="pl-s">breeding plumage of the California brown pelican.<span class="pl-pds">"</span></span></pre></div> <p><img src="https://static.simonwillison.net/static/2025/gpt-5.2-p2.png" alt="Digital illustration on a light gray/white background with a thin horizontal baseline: a stylized California brown pelican in breeding plumage is drawn side-on, leaning forward and pedaling a bicycle; the pelican has a dark brown body with layered wing lines, a pale cream head with a darker brown cap and neck shading, a small black eye, and an oversized long golden-yellow bill extending far past the front wheel; one brown leg reaches down to a pedal while the other is tucked back; the bike is shown in profile with two large spoked wheels (black tires, white rims), a dark frame, crank and chainring near the rear wheel, a black saddle above the rear, and the front fork aligned under the pelican’s head; text at the top reads "California brown pelican (breeding plumage) pedaling a bicycle"." style="max-width: 100%;" /></p> <p><strong>Update 14th December 2025</strong>: I used GPT-5.2 running in Codex CLI to <a href="https://simonwillison.net/2025/Dec/15/porting-justhtml/">port a complex Python library to JavaScript</a>. It ran without interference for nearly four hours and completed a complex task exactly to my specification.</p>