Decomposing the TWFE regression coefficient with continuous treatment dosage using FWL

Part 1!  
  
͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   

| |   
---|---|---  
| | | Forwarded this email? Subscribe here for more  
---  
  
# Decomposing the TWFE regression coefficient with continuous treatment dosage using FWL

### Part 1!

| | scott cunningham  
---  
| Apr 15  
---  
|   
---  
   
---  
| | |   
---  
| |   
---  
| |   
---  
| |   
---  
| | READ IN APP  
---  
   
  
Technically, today's post has nothing to do with Claude Code. It's purely algebraic Frisch-Waugh-Lovell, and thus because it's about continuous treatment diff-in-diff, it fits under the diff-in-diff banner, and therefore is subject to my randomized paywall. So I flipped a coin three times, it came up heads twice, therefore it's paywalled. And so paywalled it shall be. But first, let me tell you what you're going to be missing if you are not a paying subscriber.

| |   
---|---|---  
  
I'm going to walk us through the FWL decomposition of a TWFE regression coefficient. The TWFE regression coefficient is a regression of some outcome onto unit and time fixed effects for two periods and a continuous dosage variable. Think of the dose as the minimum wage. We are not, in other words, just thinking of whether a municipality raises the minimum wage -- which would be a binary treatment. We are thinking about _how much_ which is a continuous measure of treatment. So when I say "dosage", I mean "a particular value of some treatment". This is the decomposition in Table 1 of Callaway, Goodman-Bacon and Sant'Anna (CBS). 

Thanks again for all your support. Today is the day that you may want to become a subscriber because today is the day that we try to figure out what's under the hood for TWFE with continuous dose. 

Scott's Mixtape Substack is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Upgrade to paid

* * *

### TWFE Regression and FWL

I'm going to be in this section going from a regression formula, which you can think of as the population regression from which we will get a best linear predictor (BLP) population coefficient estimated with two-way fixed effects (TWFE), to one of the four decompositions in Table 1 of CBS. This part is slow because I need to master this for my own sake, and I need the steps spelled out for me, and I'm using the substack to basically go slow. 

So let's start with the regression itself. 

where _i_ indexes units, _t_ indexes pre and post, _D_ is the continuous time-invariant dose, and _Post_ is a dummy that turns on in period 2. The "time-invariant" is operationalizing a two-period diff-in-diff where at baseline, _Post=0_ , it cancels out entirely, and it's canceling out for the comparison unit too, _D=0_. But for treated units in the post period, the dose "turns on". They have more general extensions, but we start with this dosage group, _D_ , times _T_ , as that's the equivalent of the _2x2_ for those who know the modern diff-in-diff literature. 

We start by using Frisch-Waugh-Lovell to residualize the beta coefficient (technically once calculated this becomes the BLP). You can see my lectures on FWL from earlier this week in my Gov 2001 class at Harvard on probability and statistics, also, if you want to see more about it, but FWL partials out covariates and turns a multi-variate regression slope into a univariate one. In our case the covariates are the time and unit fixed effects. So with some algebra expressing various demeaning, that regression coefficient is:

That's the BLP regression coefficient with a continuous _D x Post_ interaction having been residualized by FWL into a univariate slope, like I said and it is mechanically nothing more than the OLS slope of the unit-level first difference on the dose. I don't have a visual of this itself, but I do have a visualization of this with two covariates (creating a BLP that is a plane) that through FWL becomes a univariate slope from my Gov 2001 lecture slides this week, just so you can see. By allegory, the left picture here would be the multivariate regression coefficient from the first equation (note that the slope of the plane is the same for all covariate values, hence "holding constant") and the picture on the right is the univariate slope itself. All that FWL does is rip out the slope and recast it, but in our case it will also lead us to the decompositions we care about.

| |   
---|---|---  
  
Here is the decomposition I'm focused on from Table 1 of CBS. For today, I will _only_ be targeting the "Levels" row though. That's row 2 for the positive dose weights (column 1) and the zero dose weights (column 2).

| |   
---|---|---  
  
So, picking back up where I left off, to get to our levels decomposition, I start by conditioning on _D_ by iterated expectations which causes the dose distribution to split into its point mass at zero, weight _P(D=0)_ , and its continuous part on the density of _D_ on the positive support range (note: dose cannot become negative; just only 0 or >0). 

Second, I multiply both sides of the above equation by _Var(D)_. This causes the denominator on the right-hand-side to drop out (as Var/Var=1) and we get this:

I use _m(l)_ notation to mean _E[ ▵ Y | D = l]_ and rewrite the right-hand-side of this equation as:

Third, I split the right-hand-side expectation into two parts call the atom and continuous. Note that the atom part is the change in the outcome, from pre to post, when _D=0_ (i.e., dose is zero) and the continuous part is the weighted average, via integration, across the support of the dose, times _m(D >0)_.

Going slow, I'm going to break open the _m(l)_ term inside the continuous piece by adding a zero. That zero will be _m(0) - m(0)_. This gives us inside the integral _m(l)_ equalling _m(0) + [m(l) - m(0)]_. This steps splits the integral into two more pieces. I'll rewrite it fully here so we can all see it:

Now once we replace the continuous piece with this longer two-part piece, the entire expression has three pieces. The atom piece, and a two-part continuous piece equalling the new _m(0)_ we pulled out of the integral piece and whatever is left over piece. And if we look at the first two of those pieces, we will see that they both have _m(0)_ as a factor which lets us do this:

Keep in mind that _m(D)_ is ▵ _Y_. That bracket is _E[D - E[D]]_ computed by splitting the atom part plus its continuous part. And _E[D - E[D]] = 0_ by definition because the average of deviations of the mean is exactly zero. The bracket is zero. The atom piece and the "_m(0)_ pulled out" piece exactly cancel each other. The _m(0)_ drops out and what survives is the third piece. 

Now I go back to that original expression and divide both sides by _Var(D)._ Notice now that that covariance term in the numerator, through the expectation operator, has become that "third part" I just mentioned _times_ a diff-in-diff term. Do you see it? Because _m(D)= ▵Y_, then we are getting the treated group _m(D=l_) minus the "never treated" comparison group _m(D=0)_ and because the dose in this representation is continuous, we are integrating to get that mean, hence the _dl_ integration term.

And that first bracket is the level weight. The second bracket is the "level effect" for a given dosage _l_. Like I said, this is the dosage equivalent of the 2x2. It is the difference between the expected outcome change at that dose and the expected outcome change the zero dose. And therefore the population regression coefficient from TWFE is a weighted average, via integration, of level effects where the weight function is:

Basically, though, we can imagine the TWFE population regression coefficient, via this careful FWL partialling out, to be a **weighted average of dose-level calculations**. I don't want to say "effects" yet, because that invokes potential outcomes and we aren't there yet. We are still at a purely algebraic stage, without causality. 

* * *

### Conclusion

I think that's actually a lot for one day. Let's all use pencil and paper and work through these steps to understand row 2 of Table 1 because I remain convinced that if we can understand the bias of TWFE with continuous dose, then we will be more open to learning this new diff-in-diff estimator despite probably being a bit fatigued by all the diff-in-diff in the world. But of all the designs in the diff-in-diff family, the diff-in-diff with continuous dose treatments (e.g., minimum wages) is actually a pretty common form of diff-in-diff and this literature is not nearly overwhelming to read compared to the binary treatment diff-in-diff literature which can sometimes feel like a blizzard of estimators to learn.

I think in the next substack, I'll work through code in R, Stata and python that will exactly produce the weights, with numerical examples, some graphics, a simulation and really try hard to nail down the interpretation, and only once I feel like I can actually communicate this clearly, then I will move on. But for now, that's where we are.

Scott's Mixtape Substack is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Upgrade to paid

You're currently a free subscriber to Scott's Mixtape Substack. For the full experience, upgrade your subscription.

Upgrade to paid

   
---  
| | | Like  
---  
| | Comment  
---  
| | Restack  
---  
   
  
(C) 2026 scott cunningham  
910 North 17th Street, Waco, Texas 76707   
Unsubscribe