Personal Learnings← Simon Willison  Library

Simon Willison · Tech & AI

ChatGPT agent's user-agent

TIER 4   2025-08-04

<p>I was exploring how ChatGPT agent works today. I learned some interesting things about how it exposes its identity through HTTP headers, then made a huge blunder in thinking it was leaking its URLs to Bingbot and Yandex... but it turned out <a href="https://simonwillison.net/2025/Aug/4/chatgpt-agents-agent/#cloudflare-crawler-hints">that was a Cloudflare feature</a> that had nothing to do with ChatGPT.</p>



<p>ChatGPT agent is the <a href="https://openai.com/index/introducing-chatgpt-agent/">recently released</a> (and confusingly named) ChatGPT feature that provides browser automation combined with terminal access as a feature of ChatGPT - replacing their previous <a href="https://help.openai.com/en/articles/10421097-operator">Operator research preview</a> which is scheduled for deprecation on August 31st.</p>



<h4 id="investigating-chatgpt-agent-s-user-agent">Investigating ChatGPT agent's user-agent</h4>

<p>I decided to dig into how it works by creating a logged web URL endpoint using <a href="https://simonwillison.net/2024/Aug/8/django-http-debug/">django-http-debug</a>. Then I told ChatGPT agent mode to explore that new page:</p>

<p><img src="https://static.simonwillison.net/static/2025/chatgpt-agent-url.jpg" alt="ChatGPT screenshot. My prompt was &quot;Visit https://simonwillison.net/test-url-context and tell me what you see there&quot; - it said &quot;Worked for 15 seconds&quot; with an arrow, then a screnshot of the webpage content showing &quot;simonwillison.net&quot; with a favicon, heading &quot;This is a heading&quot;, text &quot;Text and text and more text.&quot; and &quot;this came from javascript&quot;. The bot then responds with: The webpage displays a simple layout with a large heading at the top that reads “This is a heading.” Below it, there's a short paragraph that says “Text and text and more text.” A final line appears underneath saying “this came from javascript,” indicating that this last line was inserted via a script. The page contains no interactive elements or instructions—just these lines of plain text displayed on a white background." style="max-width: 100%;" /></p>

<p>My logging captured these request headers:</p>

<pre><code>Via: 1.1 heroku-router

Host: simonwillison.net

Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7

Cf-Ray: 96a0f289adcb8e8e-SEA

Cookie: cf_clearance=zzV8W...

Server: Heroku

Cdn-Loop: cloudflare; loops=1

Priority: u=0, i

Sec-Ch-Ua: "Not)A;Brand";v="8", "Chromium";v="138"

Signature: sig1=:1AxfqHocTf693inKKMQ7NRoHoWAZ9d/vY4D/FO0+MqdFBy0HEH3ZIRv1c3hyiTrzCvquqDC8eYl1ojcPYOSpCQ==:

Cf-Visitor: {"scheme":"https"}

User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36

Cf-Ipcountry: US

X-Request-Id: 45ef5be4-ead3-99d5-f018-13c4a55864d3

Sec-Fetch-Dest: document

Sec-Fetch-Mode: navigate

Sec-Fetch-Site: none

Sec-Fetch-User: ?1

Accept-Encoding: gzip, br

Accept-Language: en-US,en;q=0.9

Signature-Agent: "https://chatgpt.com"

Signature-Input: sig1=("@authority" "@method" "@path" "signature-agent");created=1754340838;keyid="otMqcjr17mGyruktGvJU8oojQTSMHlVm7uO-lrcqbdg";expires=1754344438;nonce="_8jbGwfLcgt_vUeiZQdWvfyIeh9FmlthEXElL-O2Rq5zydBYWivw4R3sV9PV-zGwZ2OEGr3T2Pmeo2NzmboMeQ";tag="web-bot-auth";alg="ed25519"

X-Forwarded-For: 2a09:bac5:665f:1541::21e:154, 172.71.147.183

X-Request-Start: 1754340840059

Cf-Connecting-Ip: 2a09:bac5:665f:1541::21e:154

Sec-Ch-Ua-Mobile: ?0

X-Forwarded-Port: 80

X-Forwarded-Proto: http

Sec-Ch-Ua-Platform: "Linux"

Upgrade-Insecure-Requests: 1

</code></pre>

<p>That <strong>Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36</strong> user-agent header is the one used by the most recent Chrome on macOS - which is a little odd here as the <strong>Sec-Ch-Ua-Platform : "Linux"</strong> indicates that the agent browser runs on Linux.</p>

<p>At first glance it looks like ChatGPT is being dishonest here by not including its bot identity in the user-agent header. I thought for a moment it might be reflecting my own user-agent, but I'm using Firefox on macOS and it identified itself as Chrome.</p>

<p>Then I spotted this header:</p>

<pre><code>Signature-Agent: "https://chatgpt.com"

</code></pre>

<p>Which is accompanied by a much more complex header called <strong>Signature-Input</strong>:</p>

<pre><code>Signature-Input: sig1=("@authority" "@method" "@path" "signature-agent");created=1754340838;keyid="otMqcjr17mGyruktGvJU8oojQTSMHlVm7uO-lrcqbdg";expires=1754344438;nonce="_8jbGwfLcgt_vUeiZQdWvfyIeh9FmlthEXElL-O2Rq5zydBYWivw4R3sV9PV-zGwZ2OEGr3T2Pmeo2NzmboMeQ";tag="web-bot-auth";alg="ed25519"

</code></pre>

<p>And a <code>Signature</code> header too.</p>

<p>These turn out to come from a relatively new web standard: <a href="https://www.rfc-editor.org/rfc/rfc9421.html">RFC 9421 HTTP Message Signatures</a>' published February 2024.</p>

<p>The purpose of HTTP Message Signatures is to allow clients to include signed data about their request in a way that cannot be tampered with by intermediaries. The signature uses a public key that's provided by the following well-known endpoint:</p>

<pre><code>https://chatgpt.com/.well-known/http-message-signatures-directory

</code></pre>

<p>Add it all together and we now have a rock-solid way to identify traffic from ChatGPT agent: look for the <code>Signature-Agent: "https://chatgpt.com"</code> header and confirm its value by checking the signature in the <code>Signature-Input</code> and <code>Signature</code> headers.</p>

<h4 id="and-then-came-the-crawlers">And then came Bingbot and Yandex</h4>

<p>Just over a minute after it captured that request, my logging endpoint got another request:</p>

<pre><code>Via: 1.1 heroku-router

From: bingbot(at)microsoft.com

Host: simonwillison.net

Accept: */*

Cf-Ray: 96a0f4671d1fc3c6-SEA

Server: Heroku

Cdn-Loop: cloudflare; loops=1

Cf-Visitor: {"scheme":"https"}

User-Agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36

Cf-Ipcountry: US

X-Request-Id: 6214f5dc-a4ea-5390-1beb-f2d26eac5d01

Accept-Encoding: gzip, br

X-Forwarded-For: 207.46.13.9, 172.71.150.252

X-Request-Start: 1754340916429

Cf-Connecting-Ip: 207.46.13.9

X-Forwarded-Port: 80

X-Forwarded-Proto: http

</code></pre>

<p>I pasted <code>207.46.13.9</code> into Microsoft's <a href="https://www.bing.com/toolbox/verify-bingbot-verdict">Verify Bingbot</a> tool (after solving a particularly taxing CAPTCHA) and it confirmed that this was indeed a request from Bingbot.</p>

<p>I set up a second URL to confirm... and this time got a visit from Yandex!</p>

<pre><code>Via: 1.1 heroku-router

From: support@search.yandex.ru

Host: simonwillison.net

Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8

Cf-Ray: 96a16390d8f6f3a7-DME

Server: Heroku

Cdn-Loop: cloudflare; loops=1

Cf-Visitor: {"scheme":"https"}

User-Agent: Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)

Cf-Ipcountry: RU

X-Request-Id: 3cdcbdba-f629-0d29-b453-61644da43c6c

Accept-Encoding: gzip, br

X-Forwarded-For: 213.180.203.138, 172.71.184.65

X-Request-Start: 1754345469921

Cf-Connecting-Ip: 213.180.203.138

X-Forwarded-Port: 80

X-Forwarded-Proto: http

</code></pre>

<p>Yandex <a href="https://yandex.com/support/webmaster/en/robot-workings/check-yandex-robots.html?lang=en">suggest a reverse DNS lookup</a> to verify, so I ran this command:</p>

<pre><code>dig -x 213.180.203.138 +short

</code></pre>

<p>And got back:</p>

<pre><code>213-180-203-138.spider.yandex.com.

</code></pre>

<p>Which confirms that this is indeed a Yandex crawler.</p>



<p>I tried a third experiment to be sure... and got hits from both Bingbot and YandexBot.</p>



<h4 id="cloudflare-crawler-hints">It was Cloudflare Crawler Hints, not ChatGPT</h4>



<p>So I wrote up and posted about my discovery... and <a href="https://x.com/jatan_loya/status/1952506398270767499">Jatan Loya asked:</a></p>



<blockquote><p>do you have crawler hints enabled in cf?</p></blockquote>



<p>And yeah, it turned out I did. I spotted this in my caching configuration page (and it looks like I must have turned it on myself at some point in the past):</p>



<p><img src="https://static.simonwillison.net/static/2025/cloudflare-crawler-hints.jpg" alt="Screenshot of Cloudflare settings panel showing &quot;Crawler Hints Beta&quot; with description text explaining that Crawler Hints provide high quality data to search engines and other crawlers when sites using Cloudflare change their content. This allows crawlers to precisely time crawling, avoid wasteful crawls, and generally reduce resource consumption on origins and other Internet infrastructure. Below states &quot;By enabling this service, you agree to share website information required for feature functionality and agree to the Supplemental Terms for Crawler Hints.&quot; There is a toggle switch in the on position on the right side and a &quot;Help&quot; link in the bottom right corner." style="max-width: 100%" /></p>



<p>Here's <a href="https://developers.cloudflare.com/cache/advanced-configuration/crawler-hints/">the Cloudflare documentation for that feature</a>.</p>



<p>I deleted my posts on Twitter and Bluesky (since you can't edit those and I didn't want the misinformation to continue to spread) and edited <a href="https://fedi.simonwillison.net/@simon/114972968822349077">my post on Mastodon</a>, then updated this entry with the real reason this had happened.</p>



<p>I also changed the URL of this entry as it turned out Twitter and Bluesky were caching my social media preview for the previous one, which included the incorrect information in the title.</p>



<details><summary>Original "So what's going on here?" section from my post</summary>



<p><em>Here's a section of my original post with my theories about what was going on before learning about Cloudflare Crawler Hints.</em></p>



<h4 id="so-what-s-going-on-here-">So what's going on here?</h4>

<p>There are quite a few different moving parts here.</p>

<ol>

<li>I'm using Firefox on macOS with the 1Password and Readwise Highlighter extensions installed and active. Since I didn't visit the debug pages at all with my own browser I don't think any of these are relevant to these results.</li>

<li>ChatGPT agent makes just a single request to my debug URL ...</li>

<li>... which is proxied through both Cloudflare and Heroku.</li>

<li>Within about a minute, I get hits from one or both of Bingbot and Yandex.</li>

</ol>

<p>Presumably ChatGPT agent itself is running behind at least one proxy - I would expect OpenAI to keep a close eye on that traffic to ensure it doesn't get abused.</p>

<p>I'm guessing that infrastructure is hosted by Microsoft Azure. The <a href="https://openai.com/policies/sub-processor-list/">OpenAI Sub-processor List</a> - though that lists Microsoft Corporation, CoreWeave Inc, Oracle Cloud Platform and Google Cloud Platform under the "Cloud infrastructure" section so it could be any of those.</p>

<p>Since the page is served over HTTPS my guess is that any intermediary proxies should be unable to see the path component of the URL, making the mystery of how Bingbot and Yandex saw the URL even more intriguing.</p>

</details>