Grasping Reality · Economics & Policy
TIER 4 Mon, 11 May 2026 13:01:10 +0000
Turn it up for what?! Prettying the plumage on your stochastic parrot via the "secret" work of the LLM "creativity" dial. Behind the paywall because I am not at all sure that this is right. But I...
͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
| |
---|---|---
| | | Forwarded this email? Subscribe here for more
---
---
This is Brad DeLong's Grasping Reality--my attempt to make myself, and all of you out there in SubStackLand, smarter by writing where I have Value Above Replacement and shutting up where I do not…
Bitter experience has shown us that a healthy public sphere can only be built on something other than click-baiting and eyeball-gluing advertisements. SubStack is now in there pitching to make things different. I won't command you to become a paying subscriber of this 'Stack: I will command you to become a paying subscriber of some 'Stack(s):
Upgrade to paid
* * *
# Model "Temperature", Stochastic Parrots, & the Pantomime Performance of the Appearance of Intelligence: Monday MAMLMs
### Turn it up for what?! Prettying the plumage on your stochastic parrot via the "secret" work of the LLM "creativity" dial. Behind the paywall because I am not at all sure that this is right. But I...
| | Brad DeLong
---
| May 11
---
|
---
---
| | |
---
| |
---
| |
---
| |
---
| | READ IN APP
---
###### Turn it up for what?! Prettying the plumage on your stochastic parrot via the "secret" work of the LLM "creativity" dial. Behind the paywall because I am not at all sure that this is right. But I do find it a very interesting rabbit hole to try to go down and then think my way through…
I wonder if there is something quietly revealing about the way commercial LLM vendors talk about temperature.
Share
The parameter is treated as a UI affordance, a mere "creativity dial." The documentation says, in effect: low temperature for reliable answers, higher temperature for brainstorming and poetry.
Then, in production products, they almost never run at 0.
That is, i think, a tell.
If what you have built were genuinely an intelligent conversational partner--call it a mind, or at least a competent reasoner--you would not need to inject random noise into its utterances to keep users engaged, to avoid collapse into repetition, or to "escape local optima."
You would ask it what it thinks, and you would get an answer.
But that is not what sells--not what gets and keeps human engagement with the MAMLM GPT LLM.
| |
---|---|---
Give a gift subscription
Failed to render LaTeX expression -- no expression found in email
Back up: What is this "temperature"?
> **Martynas Šubonis**: Zero Temperature Randomness in LLMs <https://martynassubonis.substack.com/p/zero-temperature-randomness-in-llms>: 'LLMs generate text by predicting the next token _t_ …. To do this, they produce scores… logits… how likely each possible next token is…. Temperature (_T_) adjusts these logits…. [With] **low temperature** … increasing the probability of the most likely token, thus producing less random output. [With] **High temperature** … logits become more similar, spreading probabilities more evenly, leading to more diverse and random outputs…
Leave a comment
* T close to 0 -> always pick the highest‑probability token ("greedy decoding").
* T ≈ 1 -> follow the model's learned distribution.
* T > 1 -> more exploratory, "creative", flattening out to raise the chance of a tail-probability choice.
Share DeLong's Grasping Reality: Economy in the 2000s & Before
Thus there is a sense in which temperature ≠ 0 is not just a "creative" knob. It is a mask over architectural limitations. It is a marketing patch over the fact that, left to their own devices and run greedily, these models behave like exactly what they are: high‑dimensional next-token probability machines, not thinking beings.
But why don't next-token probability machines, simulating as close as they can the function: {text conversations} -> [next-token continuations}, at temperature=0 gain and hold our attention?
My guess from the human side:
In real conversation, we do not assign high value to someone who always says the most probable thing.
The colleague who only ever repeats the consensus, the student who regurgitates the textbook, the pundit whose every sentence is a cliche--they are informationally dead. In Claude Shannon's terms, such an interlocutor carries very low surprise and thus provides very low information content. We perk up when someone says something that is both:
1. Unexpected relative to our prior; and
2. Well‑grounded in some model of the world:
3. We recognize that as _thinking,_ rather than as noisy stochastic parrogate.
An unexpected but well‑argued take on the Marshall Plan, or a novel but coherent reading of the gold standard, signals that there is another mind across the table. If you always pick the locally most probable token, you converge on strings that have much lower entropy than the human training distribution--and, within a paragraph, you conclude: this isn't telling me anything.
Thus we have:
> **Kelsey Wang** : A Comprehensive Guide to LLM Temperature <https://medium.com/@kelseyywang/a-comprehensive-guide-to-llm-temperature-%EF%B8%8F-363a40bbc91f>: 'More deterministic outputs can create the illusion of expertise…. Low temperatures can reduce decision fatigue…. High temperatures can encourage user engagement… keep users curious…. Find your Goldilocks zone…
Get 75% off a group subscription
On the other hand, an unexpected but incoherent non-sequitur--say, the term "fire in the lake" randomly thrown into a discussion of, say, long‑term interest rates--does not, in itself, convince us that the speaker is wise. It works only to the extent that it _can_ be a useful prompt for _our_ minds. When the yarrow stalks of the _I Ching_ give you that gnomic phrase:
* **Hexagram 49: 革 (Ge):**
* Below: ☲ (Li) - fire, brightness, clarity
* Above: ☱ (Dui) - lake, marsh, joy, openness
* Fire normally dries things out; a lake normally extinguishes fire. Put them together and you get a situation that obviously cannot last.
It can shake loose ideas. But the magic is not in the oracle. It is in your own associative machinery, trying desperately to make sense of it. You do the creative work. The yarrow stalks are a random seed.
Thus there are three things going on:
1. Stochastic parrotage with the next token carrying the least possible information.
2. An unexpected word giving us information, and also confirming our belief that there is another mind on the other side of the conversation that is trying to give us information.
3. A new random seed to shift the state of the conversation, and so begin gradient-descent from someplace else.
Refer a friend
(1) is not useful. (2) is what we want. (3) is a thing that some people do--people do consult the _I Ching,_ throw the yarrow stalks, but those who find it useful do so purely as a kludge to access information that is already inside the house.
And it is an empirical fact that if you let temperature = 0 and thus expose the model's nature as an argmax machine that was trained on a high‑entropy human corpus but is now being forced to emit low‑entropy results, humans judge it as dreck. But why not just live with the dreck in the name of accuracy? It really cuts down on hallucinations, after all. Because _temperature is a stage effect_. It keeps the squawking stochastic parrot from saying exactly the same thing every time, so that we can continue the polite fiction that there is someone home.
Vendors tell us this story over and over again: a "recommended" default temperature in the 0.7-1.3 range, with marketing copy that equates higher T with "creativity" rather than "adding noise to a conditional next‑token distribution". There is now a mini‑literature--both popular and academic--on "temperature as the creativity parameter". Actually, it isn't, at least not to first order: <https://arxiv.org/abs/2405.00492>. Temperature controls _stochasticity_ , not creativity. It does _not_ endow the system with a model of the world, a set of goals, or an ability to reason about counterfactual states. And yet: from the user's point of view, a higher‑T model definitely _feel_ more creative, for exactly the same reason that the yarrow stalks feel oracular. When the system jumps to a somewhat‑but‑not‑completely unexpected branch of the conversation, you attempt to decode the particular thought your interlocutor is having that pushes them there. But with an LLM, _there is no interlocutor, there is no thought, there is only the chance clatter of the yarrow stalks._ But as you try to decode that nonexistent piece of Shannon-information, that random signal pushes _you_ into a new region of your own conceptual space. That is helpful. But the cognition is happening in your prefrontal cortex, not in the transformer weights.
This is what makes temperature ≠ 0 such a powerful tell. In human conversation, the moments of high clause‑Shannon‑information--when someone says something genuinely surprising and explanatory--are the output of an internal model. They are not the result of yarrow-stick clatter.
The "stochastic parrot" metaphor, as Margaret Mitchell and Emily Bender have been patiently explaining for years, was always meant specifically for LLMs, not "AI" in general. The fact that vendors ' models _must_ squawk diversely to maintain the _illusion_ of intelligence is, I think, a thing we should pay much closer attention to.
In practice, when you and I use these systems at non‑zero temperature, what happens is a kind of cybernetic call‑and‑response. We throw a prompt. The system samples from a learned distribution. Sometimes it lands on something we have already thought of. Sometimes it lands on a near neighbor of something we have not yet articulated. One time in twenty, it throws up "fire in the lake."
We then do the work. We pick out the continuation that fits with our mental model, discard the rest as noise, and then feel as if the machine has been creative on our behalf. But the selection, the evaluation, the incorporation into our own web of concepts--that is us. Shannon would say: the channel has supplied a stream of bits; the intelligence is in the decoder.
Good lecturers do this: they sprinkle in unexpected jokes and contrarian asides to keep an audience awake. But even when we do decide to make what we say more unexpected, we do not do it by scaling our own synaptic activations by a global scalar and sampling more widely from our resulting confusion.
The temperature dial is not a knob on a mind. It is a knob on a sampler.
I see that. And now I cannot un‑see it.
Upgrade to paid
Leave a comment
##### _**If reading this gets you Value Above Replacement, then become a free subscriber to this newsletter. And forward it! And if your VAR from this newsletter is in the three digits or more each year, please become a paid subscriber! I am trying to make you readers --and myself--smarter. Please tell me if I succeed, or how I fail…**_
* * *
###### ##**model-temperature-stochastic-parrots-the-pantomime-performance-of-the-appearance-of-intelligence-monday-mamlms**
##subturingbradbot
#**model-temperature
#stochastic-parrots
#the-pantomime-performance-of-the-appearance-of-intelligence
##monday-mamlms
**#pseudo-intelligence
#ai-pantomime
#model-behavior
#temperature
#yarrow-stalks
_Please forward the email & otherwise share it to everyone you think would appreciate it…_
Share
---
| | | Like
---
| | Comment
---
| | Restack
---
(C) 2026 J. Bradford DeLong
Holgate House, P.O. Box #5488, Berkeley, CA 904705
Unsubscribe