Interconnects · Tech & AI
TIER 4 2025-11-16
<p>First, on the topic of writing, the polished, and more importantly <em>printed</em>, version of my <a href="https://rlhfbook.com/">RLHF Book</a> is available for pre-order. It’s 50% off for a limited time, you can pre-order it <a href="https://hubs.la/Q03TsMHv0">here</a>! </p><p>Like a lot of writing, I’ve been sitting on this piece for many months thinking it’s not contributing enough, but the topic keeps coming up — most recently via <span class="mention-wrap" data-attrs="{"name":"Jasmine Sun","id":25322552,"type":"user","url":null,"photo_url":"https://substackcdn.com/image/fetch/$s_!DvOq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F519d1e6e-ffad-4850-a5c9-fff32d621bc8_2300x2299.jpeg","uuid":"225a5488-c9ce-497b-a4f1-a03f6ef2e943"}" data-component-name="MentionToDOM"></span> — and people seem to like it, so I hope you do too!</p><p>It’s no longer a new experience to be struck by just how bad AI models are at writing good prose. They can pull out a great sentence every now and then, particularly models like GPT-5 Pro and other large models, but it’s always a quick comment and never many sustained successive sentences. More importantly, good AI writing feels like a lucky find rather than the result of the right incantation. After spending a long time working <em>training</em> these models, I’m fairly convinced that this writing inhibition is a structural limitation to how we train these models today and the markets they’re designed to serve.</p><p class="button-wrapper" data-attrs="{"url":"https://www.interconnects.ai/p/why-ai-writing-is-mid?utm_source=substack&utm_medium=email&utm_content=share&action=share","text":"Share","action":null,"class":null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.interconnects.ai/p/why-ai-writing-is-mid?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p>If we're making AIs that are soon to be superhuman at most knowledge work, that are trained primarily to predict text tokens, why is their ability to create high quality text tokens still so low? Why can’t we make the general ChatGPT experience so much more refined and useful for writers while we’re unlocking entirely new ways of working with them every few months — most recently the CLI agents like Claude Code. This gap is one of my favorite discussions of AI because it’s really about the definition of good writing is in itself.</p><p>Where language models can generate beautiful images from random noise, they can't reliably generate a good few sentences from a couple bullet points of information. What is different about the art form of writing than what AI can already capture?</p><p>I'm coming to believe that we <em>could</em> train a language model to be a great writer, but it goes against so many of the existing training processes. To list a few problems at different stages of the stack of varying severity in terms of their handicapping of writing:</p><ol><li><p><strong>Style isn’t a leading training objective. </strong>Language models all go through preference training where many aspects from helpfulness, clarity, honesty, etc. are balanced against each other. Many rewards make any one reward, such as style, have a harder time standing out. Style and writing quality is also far harder to measure, so it is less likely to be optimized vis-a-vis other signals (such as <a href="https://www.interconnects.ai/p/sycophancy-and-the-art-of-the-model">sycophancy</a>, which was easier to capture).</p></li><li><p><strong>Aggregate preferences suppress quirks. </strong>Language model providers design models with a few intended personalities, largely due to the benefits of predictability. These providers are optimizing many metrics for "the average user." Many users will disagree on what their preference for “good writing” is.</p></li><li><p><strong>Good writing’s inherent friction. </strong>Good writing often takes much longer to process, even when you’re interested in it. Most users of ChatGPT just want to parse the information quickly. Doubly, the people <a href="https://rlhfbook.com/c/06-preference-data#sourcing-and-contracts">creating the training data</a> for these models are often paid <em>per instance</em>, so an answer with more complexity and richness would often be suppressed by subtle financial biases to move on.</p></li><li><p><strong>Writing well is orthogonal to training biases. </strong>Throughout many stages of the post-training process, modern RLHF training exploits subtle signals for sycophancy and <a href="https://arxiv.org/abs/2310.03716">length-bias</a><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> that aren't underlying goals of it. These implicit biases go against the gradient for better writing. Good writing is pretty much never verbose.</p></li><li><p><strong>Forced neutrality of a language model.</strong> Language models are trained to be neutral on a variety of sensitive topics and to not express strong opinions in general. The best writing unabashedly shares a clear opinion. Yes, I’d expect wackier models like Grok to potentially produce better writing, even if I don’t agree with it. This leads directly to a conflict directly in something I value in writing — voice.</p></li></ol><p>All of these create models that are appealing to broad audiences. What we need to create a language model that can write wonderfully is to give it a strong personality, and potentially a strong "sense of self" — if that actually impacts a language model's thinking. </p><p>The cultivation of voice is one of my biggest recommendations to people trying to get better at writing, only after telling them to find something they want to learn about. Voice is core to how I describe my writing process.</p><div class="embedded-post-wrap" data-attrs="{"id":165344478,"url":"https://www.interconnects.ai/p/how-i-write","publication_id":48206,"publication_name":"Interconnects","publication_logo_url":"https://substackcdn.com/image/fetch/$s_!djof!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc52e8097-8f3d-4f7e-808b-2f4ad37f3b52_720x720.png","title":"How I Write","truncated_body_text":"My experience with my recent years of writing is quite confusing — almost even dissociative. I've never felt like I was a good writer and no one really told me I was until some random point in time a year or two ago. In that time span, I didn't really change my motivation nor methods, but I reaped the simple rewards of practice. I'm still wired to be ve…","date":"2025-06-06T15:23:55.703Z","like_count":52,"comment_count":4,"bylines":[{"id":10472909,"name":"Nathan Lambert","handle":"natolambert","previous_name":null,"photo_url":"https://substackcdn.com/image/fetch/$s_!RihO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fedcdfb-e137-4f6a-9089-a46add6c6242_500x500.jpeg","bio":"ML researcher making sense of AI research, products, and the uncertain technological future. PhD from Berkeley AI. Experience at Meta, DeepMind, HuggingFace.","profile_set_up_at":"2021-04-24T01:19:33.371Z","reader_installed_at":"2022-03-09T17:52:30.690Z","publicationUsers":[{"id":100753,"user_id":10472909,"publication_id":48206,"role":"admin","public":true,"is_primary":true,"publication":{"id":48206,"name":"Interconnects","subdomain":"robotic","custom_domain":"www.interconnects.ai","custom_domain_optional":false,"hero_text":"The cutting edge of AI, from inside the frontier AI labs, minus the hype. The border between high-level and technical thinking. Read by leading engineers, researchers, and investors.","logo_url":"https://substack-post-media.s3.amazonaws.com/public/images/c52e8097-8f3d-4f7e-808b-2f4ad37f3b52_720x720.png","author_id":10472909,"primary_user_id":10472909,"theme_var_background_pop":"#ff6b00","created_at":"2020-05-21T02:59:47.895Z","email_from_name":"Interconnects by Nathan Lambert","copyright":"Interconnects AI, LLC","founding_plan_name":"Founding Member","community_enabled":true,"invite_only":false,"payments_state":"enabled","language":null,"explicit":false,"homepage_type":"newspaper","is_personal_mode":false}},{"id":4610799,"user_id":10472909,"publication_id":4519930,"role":"admin","public":true,"is_primary":false,"publication":{"id":4519930,"name":"natolambert overflow","subdomain":"natolambert","custom_domain":null,"custom_domain_optional":false,"hero_text":"a place for any extra thoughts beyond Interconnects.ai","logo_url":"https://substack-post-media.s3.amazonaws.com/public/images/eb88d599-32c8-49a9-ba33-ab6327aff727_256x256.png","author_id":10472909,"primary_user_id":null,"theme_var_background_pop":"#FF6719","created_at":"2025-03-27T15:04:05.448Z","email_from_name":null,"copyright":"Nathan Lambert","founding_plan_name":null,"community_enabled":true,"invite_only":false,"payments_state":"disabled","language":null,"explicit":false,"homepage_type":"newspaper","is_personal_mode":false}},{"id":4926744,"user_id":10472909,"publication_id":4830082,"role":"admin","public":true,"is_primary":false,"publication":{"id":4830082,"name":"Retort AI","subdomain":"retortai","custom_domain":"www.retortai.com","custom_domain_optional":false,"hero_text":"Distilling the major events and challenges in the world of artificial intelligence and machine learning, from Thomas Krendl Gilbert and Nathan Lambert.\n\n","logo_url":"https://substack-post-media.s3.amazonaws.com/public/images/cbad298c-6074-441b-ad43-d5df6dbf101d_800x800.png","author_id":10472909,"primary_user_id":null,"theme_var_background_pop":"#FF6719","created_at":"2025-04-25T22:10:28.216Z","email_from_name":null,"copyright":"Nathan Lambert","founding_plan_name":null,"community_enabled":true,"invite_only":false,"payments_state":"disabled","language":null,"explicit":false,"homepage_type":"newspaper","is_personal_mode":false}}],"twitter_screen_name":"natolambert","is_guest":false,"bestseller_tier":100,"status":{"bestsellerTier":100,"subscriberTier":5,"leaderboard":null,"vip":false,"badge":{"type":"bestseller","tier":100},"paidPublicationIds":[1084089,883883,69345,1084918,6349492,6027],"subscriber":null}}],"utm_campaign":null,"belowTheFold":true,"type":"newsletter","language":"en","source":null}" data-component-name="EmbeddedPostToDOM"><a class="embedded-post" native="true" href="https://www.interconnects.ai/p/how-i-write?utm_source=substack&utm_campaign=post_embed&utm_medium=web"><div class="embedded-post-header"><img class="embedded-post-publication-logo" src="https://substackcdn.com/image/fetch/$s_!djof!,w_56,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc52e8097-8f3d-4f7e-808b-2f4ad37f3b52_720x720.png" loading="lazy"><span class="embedded-post-publication-name">Interconnects</span></div><div class="embedded-post-title-wrapper"><div class="embedded-post-title">How I Write</div></div><div class="embedded-post-body">My experience with my recent years of writing is quite confusing — almost even dissociative. I've never felt like I was a good writer and no one really told me I was until some random point in time a year or two ago. In that time span, I didn't really change my motivation nor methods, but I reaped the simple rewards of practice. I'm still wired to be ve…</div><div class="embedded-post-cta-wrapper"><span class="embedded-post-cta">Read more</span></div><div class="embedded-post-meta">a year ago · 52 likes · 4 comments · Nathan Lambert</div></a></div><p>When I think about how I write, the best writing relies on voice. Voice is where you process information into a unique representation — this is often what makes information compelling.</p><p>Many people have posited that base models make great writers, such as when I discussed poetry with <a href="https://www.interconnects.ai/p/interviewing-andrew-carr">Andrew Carr on his Interconnects appearance</a>, but this is because base models haven’t been squashed to the narrower style of post-trained responses. </p><p>I’ve personally been thinking about this sort of style induced by post-training recently as we prepare for our next Olmo release, and many of us think the models with lower evaluation scores on the likes of AlpacaEval or LMArena actually fit our needs better. The accepted style of chatty models today, whether it’s GPT-5, DeepSeek R1, or a large Qwen model, is a bit cringe for my likes. This style is almost entirely applied during post-training.</p><p>Taking a step back, this means base models show us that there <em>can</em> be great writing out of the models, but it’s still far from reliable. Base models aren't robust enough to variations to make great writers — we need some form of the constraints applied in post-training to make models follow Q&A. The next step would be solving the problem of how models aren’t trained with a narrow enough experience. Specific points of view nurture voice. The target should be a model that can output tokens in any area or request that is clear, compelling, and entertaining. </p><p>We need to shape these base models with post-training designed for writing, just as the best writers bend facts to create narrative. </p><div class="subscription-widget-wrap-editor" data-attrs="{"url":"https://www.interconnects.ai/subscribe?","text":"Subscribe","language":"en"}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Interconnects is a reader-supported publication. Consider becoming a subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email…" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>Some models makers care <em>a bit </em>about this. When a new model drops and people rave about its creative writing ability, such as MoonShot AI’s <a href="https://www.interconnects.ai/p/kimi-k2-and-when-deepseek-moments">Kimi K2</a> line of model, I do think the team put careful work into the data or training pipelines. The problem is that no model provider is remotely ready to sacrifice core abilities of the model such as math and coding in pursuit of meaningfully better writing models. </p><p>There are no market incentives to create this model — all the money in AI is elsewhere, and writing isn’t a particularly lucrative market to disrupt. An example is <a href="https://www.interconnects.ai/p/gpt-45-not-a-frontier-model">GPT 4.5</a>, which was to all reports a rather light fine-tune, but one that produced slightly better prose. It was shut down almost immediately after its launch because it was too slow and economically unviable with its large size.</p><p>If we follow the voice direction, the model that is likely to be the best writer relative to its overall intelligence was the original revamped Bing (aka Sydney) model that <a href="https://www.nytimes.com/2023/02/16/technology/bing-chatbot-microsoft-chatgpt.html">went crazy in front of many users</a> and was rapidly shut down. That model had <strong>THOUGHTS</strong> it wanted to share. That’s a starting point, but a scary one to untap again. This sort of training goes far beyond a system prompt or a light finetune, and it will need to be a new post-training process from start to end (more than just a light brush of <a href="https://www.interconnects.ai/p/opening-the-black-box-of-character">character training</a>).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TeE-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde8c9880-e748-4e4d-944e-0e244ba4cf99_1120x693.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TeE-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde8c9880-e748-4e4d-944e-0e244ba4cf99_1120x693.png 424w, https://substackcdn.com/image/fetch/$s_!TeE-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde8c9880-e748-4e4d-944e-0e244ba4cf99_1120x693.png 848w, https://substackcdn.com/image/fetch/$s_!TeE-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde8c9880-e748-4e4d-944e-0e244ba4cf99_1120x693.png 1272w, https://substackcdn.com/image/fetch/$s_!TeE-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde8c9880-e748-4e4d-944e-0e244ba4cf99_1120x693.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TeE-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde8c9880-e748-4e4d-944e-0e244ba4cf99_1120x693.png" width="1120" height="693" data-attrs="{"src":"https://substack-post-media.s3.amazonaws.com/public/images/de8c9880-e748-4e4d-944e-0e244ba4cf99_1120x693.png","srcNoWatermark":null,"fullscreen":null,"imageSize":null,"height":693,"width":1120,"resizeWidth":null,"bytes":174577,"alt":null,"title":null,"type":"image/png","href":null,"belowTheFold":true,"topImage":false,"internalRedirect":"https://www.interconnects.ai/i/167772857?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde8c9880-e748-4e4d-944e-0e244ba4cf99_1120x693.png","isProcessing":false,"align":null,"offset":false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TeE-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde8c9880-e748-4e4d-944e-0e244ba4cf99_1120x693.png 424w, https://substackcdn.com/image/fetch/$s_!TeE-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde8c9880-e748-4e4d-944e-0e244ba4cf99_1120x693.png 848w, https://substackcdn.com/image/fetch/$s_!TeE-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde8c9880-e748-4e4d-944e-0e244ba4cf99_1120x693.png 1272w, https://substackcdn.com/image/fetch/$s_!TeE-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde8c9880-e748-4e4d-944e-0e244ba4cf99_1120x693.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We need to be bold enough to create models with personality if we want writing to fall out. We need models that speak their views loudly and confidently. These also will make more interesting intellectual companions, a niche that Claude fills for some people, but I struggle with Claude plenty of times due to its hesitance, hedging, or preferred answer format.</p><p>For the near future, the writing handicap of large language models is here to stay. Good writing you have to sit in to appreciate, and ChatGPT and the leading AI products are not optimized for this whatsoever. Especially with agentic applications being the next frontier, most of the text written by the models will never even be read by a human. Good writing is legitimately worse for most of the use cases I use AI for. I don’t like the style per se, but having it jump to be a literary masterpiece would actually be worse.</p><p>I don’t really have a solution to AI’s writing problem, but rather expensive experiments people can try. At some point I expect someone to commission a project to push this to its limits, building a model just for writing. This’ll take some time but is not untenable nor unfathomably expensive — it’ll just be a complete refresh of a modern post-training stack.</p><p>Even if this project was invested in, I don’t expect the models to be close to the best humans at elegant writing within a few years. Our current batch of models as a starting point are too far from the goal. With longer timelines, it doesn’t feel like writing is a fundamental problem that can’t be solved.</p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>Or via inference-time scaling leaking into every domain.</p></div></div>