Scott's Mixtape · Economics & Policy
TIER 4 Mon, 13 Apr 2026 09:59:17 +0000
I spent the weekend in New York for NABE and saw my first Broadway show, Buena Vista Social Club. It was extraordinary. I can't remember a time I've ever seen actors and musicians like that, and how the audience was drawn into the performance. I was stunned. The Havana social clubs must have been extraordinary. But today is about Claude Code.
͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
| |
---|---|---
| | | Forwarded this email? Subscribe here for more
---
# Claude Code 41: Updating my workflow and skills
| | scott cunningham
---
| Apr 13
---
|
---
---
| | |
---
| |
---
| |
---
| |
---
| | READ IN APP
---
I spent the weekend in New York for NABE and saw my first Broadway show, Buena Vista Social Club. It was extraordinary. I can't remember a time I've ever seen actors and musicians like that, and how the audience was drawn into the performance. I was stunned. The Havana social clubs must have been extraordinary. But today is about Claude Code.
| |
---|---|---
| |
---|---|---
In the last post about Claude Code, I started walking us through the decomposition of the TWFE weights in continuous diff-in-diff. And to do that, I had had Claude Code make a "beautiful deck" solely about those weights. But both that deck, but also a few other decks since then, prompted me to want to rework that skill, and that's what today is about -- the updating of my /beautiful_deck skill, as well as a few others. These are the skills I use now pretty regularly, and so I wanted to share what I changed, and why.
This is the first time I've really tried to _improve_ skills rather than just create them once or just use them. Up to now, I'd been letting Claude Code manufacture the skills entirely based on vibed descriptions and what I was going after. I'd describe what I wanted, Claude would write the instructions, and I'd invoke them. But I'd noticed that one of them really wasn't working right, and the process of figuring out why taught me something about what these skills actually are and how they fail.
Scott's Mixtape Substack is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
Upgrade to paid
* * *
## **Beautiful Deck Was Broken and I Didn 't Know Why**
My `/beautiful_deck` skill was my attempt to automate the language of calling up a new presentation. Rather than always saying "make a beautiful deck, read the Rhetoric of Decks essays, one idea per slide, assertion titles, Gov 2001 palette, compile to zero warnings" -- I tried to capture all of that in one invocable skill. One command, and the first pass of a deck happens automatically. Then I move into a refining stage of iteration.
It wasn't that I was trying to automate the deck creation. Rather, I was trying to get down a first draft so that I could move into the stage I prefer which is to feel out the talk, get a sense of the direction it would take, works backwards from certain topics or spots, and massage out problematic parts of the lecture. I was increasingly letting Claude piece together a lecture based on a variety of directions I would give, and materials, including my own writings and scribbles, and as my preference is for all my talks to now lean heavily on displaying data quantification as well as graphic-based narrative, I tended to also request graphics from Tikz and .png produced by R and python.
And it mostly worked. It was a good starting point and I found it perfect for what I was needing to get the refinement stage to work. The execution from my outlines were solid, the slides were beautiful, the balancing of ideas across slides so that the cognitive density was minimized was working.
But the TikZ execution had a fairly high error rate. I was still not getting the clean diagrams I wanted. Labels would sit on top of arrows, text would overflow boxes, and the compile loop would spin trying to fix things that were generated wrong in the first place.
The last part was also new. I had been trying to find a way to instill more discipline in the Tikz graphs by having Claude fix them through a series of checks, thinking that maybe the reason those arrows on top of objects, etc., could be addressed by, on the back end, having Claude systematically edit graphs through checks.
But this as it turned out was a mistake. What I found was that the skill had inadvertently told Claude _what to audit after generation_ but never told it _how to generate TikZ safely in the first place_. The downstream repair tool -- my `/tikz` audit skill -- was being asked to fix problems that were baked in from the start: autosized nodes that made arrow endpoints unpredictable, labels without directional keywords landing on arrows, `scale` factors that shrank coordinates but not text, and parameterized style definitions (`#1`) inside Beamer frames where the `#` character gets consumed by Beamer's argument parser before TikZ ever sees it.
So, Claude suggested a new fix which was a new section in the skill (Step 4.4) with six generation rules. Explicit node dimensions on every node. Directional keywords on every edge label. A coordinate-map comment block before every diagram. Canonical templates for common diagram types. Never use `scale` on complex figures. And crucially: never define parameterized styles inside a Beamer frame -- define them all in the preamble with `\tikzset{}`.
I also added what I'm calling a **circuit breaker**. The old skill said "recompile until clean," which Claude interpreted as "keep trying forever." When a compile error resisted three different fix attempts, the agent would spiral -- each fix introducing new problems that obscured the original error. I watched one session burn an hour doing this. The circuit breaker says: after three failed approaches to the same error, stop editing, tell me exactly what's happening, and ask how to proceed. The cost of stopping is two minutes. The cost of spiraling is an hour and a file that's worse than when you started.
I don't know yet whether these changes have actually improved the skill. Last night I watched it generate a 42-slide deck that was genuinely gorgeous in conception -- the rhetoric, the structure, the visual design were all exactly what I wanted. But it got stuck in a problem-solving loop for an hour on TikZ compile errors. So the circuit breaker needs tightening, and there's probably a Rule 7 about not generating 35 tikzpictures in a single Beamer document. I'm learning. These are my first real attempts at improving skills rather than just using them.
If you want to try /beautiful_deck, and give me feedback, please do. It's possible that I just can't automate the "beautiful pictures" and that maybe the optimal approach was what I was originally doing which was to just iterate a lot until the figures are perfect, rather than have it more automatic up front. I do like the invoking of my rhetoric of decks essay, but I guess I keep hoping I can find a way to help Claude recognize these errors in the Tikz graphics, despite his inability reason spatially.
Thanks for reading Scott's Mixtape Substack! This post is public so feel free to share it.
Share
* * *
## **Split-PDF Got Smarter Thanks to a Reader**
My `/split-pdf` skill is the one I use most. It takes an academic paper -- a PDF file or a search query -- and splits it into four-page chunks, reads them in small batches, and writes structured notes. The reason it exists is simple: historically, for me, Claude would crash or hallucinates on long PDFs. Splitting forces careful reading and externalizes comprehension into markdown notes.
A few days ago, Ben Bentzin -- an associate professor of instruction at the McCombs School of Business at UT Austin -- wrote to me. He'd adapted the skill for his own workflows and made several improvements that were better than what I had. The core was the same, but he'd identified problems I hadn't noticed.
His biggest contribution was **agent isolation**. When another skill calls `/split-pdf` -- say, `/beautiful_deck` reading a paper before generating slides -- each PDF page renders as image data in the conversation context. A 35-page paper can add 10-20MB. After reading two or three large PDFs on top of prior work, the conversation hits the API request size limit and becomes unrecoverable. Ben's fix: run the PDF reading inside a subagent. The subagent reads the pages, writes plain-text output, and the parent skill only reads the text. The image data stays contained.
He also added **persistent extraction**. After all batches are read, the skill saves a structured `_text.md` file alongside the source PDF. On future invocations, it checks for this file first and offers to reuse it -- skipping re-reading entirely. The first deep read might cost four rounds of PDF rendering. The second costs one markdown file read. He added split reuse too -- if splits already exist from a previous run, offer to reuse them rather than re-splitting. And he switched to in-place PDF handling, so the skill works wherever your file already lives rather than copying everything into a centralized `articles/` folder.
I wrote the implementation independently -- the code in my repo is mine -- but the ideas are his, and I credited him by name in the skill's documentation. If you've been using `/split-pdf`, the new version is noticeably faster and more reliable on multi-paper sessions. Thanks Ben -- I'm grateful you found a way to make significant improvements on this practical skill.
Leave a comment
* * *
## **Blindspot: Making the Stone Stony Again**
This one is new. It used to be called `/fletcher`, after Jason Fletcher at Wisconsin, who was the one who curiously wondered about rounding in my post about p-hacking. I had interpreted heaps of t-statistics around 1.96 critical value as evidence of p-hacking in the APE project (AI generated papers), but Jason had noticed similar heaps at 1 and 3, which would've made heaps at non-random intervals (1, 2 and 3). As it turned out, the heaps were generated by using imprecise coefficients and standard errors, extracted from the papers themselves and not the raw data and actual code (which I didn't have). The more imprecise our coefficients and standard errors are, the more you end up with rounded t-stats that heap at non-random intervals -- a pretty interesting mathematical phenomena, to be honest, and maybe one of the more impressive things to come out of that exercise. I didn't see it, though, because I simply couldn't see the things "off camera", as I was so focused on what I was focused on -- the heaping at 1.96.
So, I developed `/fletcher `because I wanted to try and instill a discipline to catch errors earlier, but not so much coding errors, as the types of errors I am prone to when I can't see the forest for the trees. Was there a way to get an impartial spectator to come into the project soon and often to simply look _near_ the project's focus, but not _directly at_ the project's focus? Sometimes if you can look away from something, you can see it better, and so that was the purpose of that skill
I decide to rename it `/blindspot` because that's what it actually does, and a descriptive name communicates the concept to someone who hasn't read the origin story.
The theoretical frame comes from Viktor Shklovsky, the Soviet literary theorist, who argued that art exists to restore perception. His metaphor: a man who walks barefoot up a mountain eventually cannot feel his feet. Everything becomes habitual, automatic, unconscious. Art exists to make the stone stony again -- to force you to feel what you have stopped noticing.
For me, research regularly has the same problem. By the time I have spent months on a paper, I can't feel the stones under my feet. The main finding has collapsed my attention. Everything else in the output -- the coefficient that flips sign in one spec, the sample size that drops between columns, the heterogeneity richer than the average effect -- has become invisible or simply interpretable in a kind of mindless, defensive way.
| |
---|---|---
Blindspot is organized around a 2x2 grid of **vices** (problems hiding in plain sight) and **virtues** (opportunities being overlooked). Vice 1 is the Unexplained Feature -- something in the output that doesn't fit the story but nobody asked about it. Vice 2 is the Convenient Absence -- the robustness check never run, the subgroup never examined, the dog that didn't bark. Virtue 1 is the Unasked Question -- heterogeneity that's more interesting than the average, a mechanism visible in the data but absent from the hypothesis. Virtue 2 is the Unexploited Strength -- an identification argument stronger than the paper claims, a falsification test that would crush the main objection but was never run.
I run `/blindspot` _before_ I run `/referee2`, and the distinction matters. Referee 2 is a health inspector. It checks whether your code is correct, whether the pipeline replicates across languages, whether the identification strategy is sound. It runs in a fresh session with a Claude instance that has never seen the project, because the Claude that built the code cannot objectively audit it. Referee 2 asks: _is this implemented correctly?_
Blindspot asks a different question: _can you see what 's in front of you?_ It runs in the same session, at the moment output first appears, before you've started writing. It doesn't need separation from the working session because it's not auditing implementation -- it's auditing perception. You are the right person to do that, with a structured forcing function to look past what you expect to see. I need something that can pull back and not get so into the weeds that it misses the obvious.
The workflow is: produce output, run `/blindspot`, interpret and write, complete the project, then open a fresh terminal and run `/referee2`. Between the two of them, they cover what I think of as the two failure modes: not seeing what's there, and not catching what's wrong.
I'm a beginner when it comes to making skills. These are mine. They're available at github.com/scunning1975/mixtapetools, and I'd welcome anyone who wants to adapt them, improve them, or tell me what I'm missing. That's how the split-pdf improvements happened, and I suspect it's how the next ones will too.
Share Scott's Mixtape Substack
You're currently a free subscriber to Scott's Mixtape Substack. For the full experience, upgrade your subscription.
Upgrade to paid
---
| | | Like
---
| | Comment
---
| | Restack
---
(C) 2026 scott cunningham
910 North 17th Street, Waco, Texas 76707
Unsubscribe