Personal Learnings← Scott's Mixtape  Library

Scott's Mixtape · Economics & Policy

TWFE Continuous Decompositions: The regression never changes. The question does.

TIER 5   Thu, 23 Apr 2026 10:26:50 +0000

The continuous-treatment DiD paper I've been working through for a series of posts has been primarily focused, not on the causal parameters and not on the estimators, but rather on two-way fixed effects (TWFE) and Frisch-Waugh-Lovell (FWL).  
  
͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­͏   ­

| |   
---|---|---  
| | | Forwarded this email? Subscribe here for more  
---  
  
# TWFE Continuous Decompositions: The regression never changes. The question does.

| | scott cunningham  
---  
| Apr 23  
---  
|   
---  
   
---  
| | |   
---  
| |   
---  
| |   
---  
| |   
---  
| | READ IN APP  
---  
   
  
The continuous-treatment DiD paper I've been working through for a series of posts has been primarily focused, not on the causal parameters and not on the estimators, but rather on two-way fixed effects (TWFE) and Frisch-Waugh-Lovell (FWL). I have been narrowly interested in the decomposition weights of TWFE that Callaway, Goodman-Bacon and Sant'Anna (CBS) present in Table 1 of the paper. I have been presenting the "Levels" weights in an R shiny paper hosted on my website too. Today I wanted to give an overview of what exactly in this table, and why are there four different rows in the "Decomposition" column if there is only one TWFE regression to decompose using FWL?

| |   
---|---|---  
  
Scott's Mixtape Substack is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Upgrade to paid

* * *

## **Table 1 is not presenting four decompositions because the regression never changes in Table 1**

The central fact, which I want readers to hold fixed before we go anywhere else: Table 1 is presenting decomposition weights from only one regression. Here's what I mean. Let's say that you regress your outcome on the continuous treatment dose, in a panel of two periods, with industry and time fixed effects. What happens? 

You get a single OLS estimate. 

There will only be one coefficient. We conventionally call it β-hat. And using FWL, we can decompose it algebraically into a set of weights consisting of a density function (_f_D(l))_ , deviations from the mean (_l-E[D]_) and variance (_Var(D)_). Those three things are the building blocks of the "Levels" row. You have those three, you can always calculate the β-hat from TWFE without ever technically running the regression because FWL lets you calculate algebraically the regression coefficient without actually running the regression.

But there are three other rows. So what is going on here? How can there be four rows of weights and they all be from FWL? Would those weighted values give the same number, then, if there is only one regression? What are the different rows doing?

Again, let me be clear and say what they are _not_ doing. The four rows of Table 1 are **not** corresponding to four different regressions. They're not even four different summary statistics you compute on the side. Rather, they are four algebraic rewrites of the same β-hat where each expression of that single number as a weighted average of some underlying causal parameter which I have been avoiding for the moment. OLS equals all of the decomposition. Another way to say that is to say that the decomposition is not unique.

Every row of Table 1 lands on the same value. What differs between rows is what that value is a weighted average _of_.

Thanks for reading Scott's Mixtape Substack! This post is public so feel free to share it.

Share

* * *

## **Four questions an applied researcher might actually be asking**

Let's be concrete and talk again about the Lu and Yu (2015) AEJ:Applied article. Recall that China joined the WTO in 2001. This forced any industry with a pre-WTO tariff above roughly 10% to cut down to a WTO-compliant ceiling. Industries already at or below the ceiling didn't cut at all. The _size_ of each industry's mandated cut was mechanically tied to how high its pre-WTO tariff had been. And this is the regression we have been using.

When a practitioner runs a regression of industry outcomes on 2001 Chinese tariffs (above) and gets a number (β-hat) from that OLS, they will usually describe it in one of a few ways. Let me state examples of each of them.

  1. "This is the effect of the program."

  2. "This is the effect of a marginal one point change in the dose."

  3. "This is the average effect of one unit of the dose."




One regression -- β-hat -- but _three distinct questions_. Not different shades of one another. Rather, three distinct _questions_. So let me elaborate on each of them.

Give a gift subscription

* * *

### Question 1: The Effect of the Program

For an industry that faced a 15 percentage point tariff in 2001, what's the effect of that tariff on their outcome, compared to an industry that faced no tariff at all? This is a level effect. This is the one I have been working with the last few substacks. This is the one that you can find under the first tab of my shiny. It is a contrast between treated at dose `l` and untreated. Table 1's second row labeled the "levels decomposition," rewrites β-hat as a weighted average over level effects of this kind.

Refer a friend

* * *

### Question 2: The Effect of a Marginal One-Point Change in the Dose

But now let's change the question away from "the effect". Let's look at a particular situation and ask a different question.

For an industry already sitting at a 15-point tariff, what would happen if you nudged their tariff up by a tiny little sliver? What happens? 

That's a derivative. It is a local slope of the response curve at dose `l`. And once we get into the causal parameters, we will be able to map that question to a particular causal response which CBS call the ACRT. And that is the third row of Table 1 labeled "Scaled Levels". It is β-hat rewritten as a weighted average of local slopes.

* * *

### Question 3: The Average Effect of One Unit of the Dose

If industries with higher tariffs experienced bigger outcome changes, you can divide each industry's level effect by its dose to get an effect per percentage point, then average. That's the scaled level effect. Row two.

Scott's Mixtape Substack is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Upgrade to paid

* * *

### Question 4: How do Two Treated Industries Compare to Each Other?

The fourth question is a little different. What if we were to pick an industry with a lower tariff `l` and an industry with a higher tariff `h`? and then were to ask about the slope of outcomes between them per unit of tariff gap? What is that question exactly?

Well, in that scenario, you **never** go through the untreated at all. Do you see that? In that situation, you are comparing the high and low dose industries to one another, not to some counterfactual where neither is treated. You're comparing **dose to dose**. The scaled 2×2 decomposition, row four, rewrites β-hat as a weighted average of these pairwise slopes.

* * *

## **The real point of the paper**

So there are four questions in Table 1, but there is only one regression in Table 1.

Once you hold that β-hat fixed, you can see the four rows in Table 1 as **four questions**. And when you do, then you can track the rhetoric of their paper as it moves from merely FWL's single interpretation to being something more nuanced and a bit sharper. And here it is:

_Does TWFE have a good interpretation in any of them?_

See, this is one of the cores of the paper. Can I take any of those four questions -- all very good questions -- and ask if my TWFE coefficient has a satisfying answer to them? The weighted averages are the _answer_ to four different questions and the answer?

No. Most of the time, no. 

Most of the time, β-hat from OLS is expressible as a weighted average under all four framings. And that is purely mechanical. Purely algebraic. Pops out of FWL. 

But in none of the four cases, as we'll see once we move through Table 1, which is _only_ about TWFE's estimate, none of the four cases contains a β-hat expression that cleanly equals the parameter that you would have written down at the start of your research project.

We start with the estimand. The estimand is a population parameter, usually a causal population parameter, and we estimate it in OLS. But, while the best linear predictor (BLP) is a population summary of the conditional expectation function (CEF), and OLS is its implementation in the sample, not all population estimands are the same, and not all of them map, therefore, onto an OLS based answer. None of the four cases in Table _cleanly_ equal the causal target parameter that you would have likely written down at the start of your research project. None of the rows in Table 1 correspond to the population estimand which is nothing more than an explicitly stated question that can be interpreted as a causal response. So what are they?

Well, in the level decomposition, the weights have to sum to zero. They **must**. Why? Because look at the first term -- we **recentered** each unit's tariff relative to the mean. And if you recenter, that necessarily means some of the level effects will get negative weights. Which ones? Well, the below average industries will get negative weights because that's what recentering means in that case. 

See below the red line in the left graph's position relation relative to the dashed line which is the mean dose in our data. Then look at that dose's location on the right graph which lands in the pink-ish zone. Then go left to the y-axis -- negative weight. 

| |   
---|---|---  
  
That is the reasoning you get from recentering the dose around its mean. An industry with a below-average tariff will pull β-hat **down** when its level effect ought to be pulling β-hat up. 

I have not yet gotten into the other three, as I thought it was more important we really nail the levels as we have to start somewhere, and I found that one to be the easiest to start with. 

But, here let me give a quick heads-up to one of the others. The decomposition of β-hat rewritten as the causal response decomposition has all non-negative weights which is nicer, but weirdly they do not correspond to the density of tariffs in your sample. With that one, the industries with moderate tariffs will get weighted more heavily than industries at the extremes for reasons that have nothing to do with your research design. 

And then the other two rows are different versions of the same story. Those answers to their questions will change the shape, and introduce a mismatch between β-hat and any population estimand parameter of interest. 

And remember -- we start with estimands. We don't start with OLS. Regressions in the sample are part of the sampling distribution of OLS which on average under the law of large numbers will equal the best linear prediction (BLP) that you could run in the population, but that does not mean that the BLP itself is the goal. The goal is always to answer a question where the answer corresponds to a target parameter, usually a causal parameter. 

One of the paper's real contributions is that the answer you want lives upstream from a regression, and OLS cannot itself infer which one you asked, and cannot itself give a satisfying answer either unless you put restrictions on it. So Table 1 isnot a menu of decompositions. It's more like a diagnosis because it shows, through the window of FWL, the interpretation of the regression coefficient from OLS. 

* * *

## **The negative weights are not a bug**

If you would like to write a regression coefficient as a weighted average of good comparisons, then you will get negative weights. And if you find that dissatisfying and instead write the regression coefficient in terms of forbidden comparisons, you will get rid of the negative weights but then have to deal with the fact that you are now comparing two treated groups to one another. 

So let's ask the obvious. What exactly does "good" mean in our case? 

"Good" means comparisons to the untreated. Which I think is one of the core lessons a lot of us learned from the earlier diff-in-diff literature, a lot of which we owe to the Bacon decomposition from Andrew's 2001 _Journal of Econometrics_. 

A good comparison is about comparing an industry that faced a tariff versus an industry that faced none. It corresponds to a particular causal parameter I have not yet worked out, but for now just note that such a comparison is sometimes called clean, and it is probably intuitive in many contexts. It is the Y(1)-Y(0) unit level treatment effect summarized and averaged. It's the core of the potential outcomes framework which expresses treatment effects as comparisons to an untreated, Y(0), treatment state. That's what the level decomposition gives you. 

But the weights in that decomposition are _forced_ to sum to zero through the mechanics of OLS, which we can only see if we crack it open using FWL like an atom smasher. And if the weights sum to zero, then they _have to_ be negative somewhere and positive somewhere else. And FWL shows that those negative weights are on the industries with below average tariffs which you can see by moving the slider around in my shiny app. 

But the scaled 2×2 decomposition manages to escape the negative weights. Every weight in the scaled 2x2 row is positive and sum to one. This is sometimes what people mean when they say that averaging is "well behaved". They mean that the weights sum to one and non are negative. 

Think about synthetic control -- the weights are non-negative and sum to one, which forces the synthetic control to live without extrapolation, but at the same time, can force the synthetic control to completely miss too. 

Well you can never measure the treatment effects in real life because you are always missing one of the necessary potential outcomes per unit. So all you can ever do is make comparisons, and hopefully principled ones that correspond to an interpretable population estimand, or "target parameter". And in the scaled 2x2 from OLS, the comparisons you're now averaging over are pairs of _treated_ industries, with one of the pairs drawn from a lower tariff and the other drawn from a higher tariff. 

Bacon's 2021 _Journal of Econometrics_ paper famously flagged comparisons like these and we now refer to them as the "late versus early" comparison in the staggered binary case. They are the "forbidden comparison" according to the language we find in other writers like Borusyak, Caravel and Speiss I think, as well as de Chaisemartin and D'Haultfoueille. Such comparisons introduce contamination (see Sun and Abraham 2021 too) because one treated unit's effect can infect the comparison under general conditions related to what those treatment effects are. 

Well, in the continuous world, the same concern comes back: you're contrasting two units that are both in the treatment, just in different amounts. And as I said, that introduces its own issues. 

And thus we see the bind that our OLS model above puts us in. As the economist likes to say, there is no free lunch. You're damned if you do; damned if you don't. Yet paradoxically, it's not in the intuitive direction because with continuous treatment, good comparison give us negative weights through the re-centering that OLS does. And with forbidden comparisons, we get positive weights. And both of these are exact rewrites of the same number.

I don't want to put words into the authors' mouth, but it is almost like Table 1 is showing us that _all_ of these are forbidden comparisons except for comparisons to untreated units. Negative weights aren't a flaw in OLS. Rather, they are the price of asking a clean question. If you want every comparison in your weighted average to be an untreated-versus-treated contrast, then your weights will have signs you don't like. And if you don't like those signs, you have to give up the cleanness. And the rows in Table 1 are just about picking your poison.

Scott's Mixtape Substack is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Upgrade to paid

* * *

## **What this means for the tariff regression**

So when I run OLS of industry outcomes on 2001 tariff, and I get a coefficient, what question am I answering?

If I was asking "what's the effect of the tariff?" in the level sense, it is the average difference between an industry taxed at `l` and an industry taxed at zero. And that is the one for the last few substacks I've been focused on answering. And that's the one with negative weights. In our simulation, the below-average-tariff industries are pulling the coefficient in the wrong direction.

Well what if I asked the marginal-effect question? If I ask the marginal-effect question, then I'm also getting an answer, and the weights look nicer since they won't be negative as I'll work through in a later post, but you will get a weighting scheme that doesn't match the distribution of tariffs I actually have.

And if I was asking the per-unit question, which is a pairwise-slope question, it is the same story. That answer gets me a single number with weights that are determined by the variance of the dose, not by anything about my research question.

If you listen close, you can actually hear echoes of the Bacon decomposition of OLS with panel unit and time fixed effects from the binary case with staggered diff-in-diff. 

Part of the point of their paper is to say that none of the four rewrites are the clean thing we are usually after. And nothing in the regression output tells me which of the four I was trying to ask.

* * *

## **The move the paper makes next**

The constructive half of the CBS paper in section 4 is about what to do after we have all collectively internalized all that. We engage in "population first" style reasoning, not "regression first" style reasoning. That is, we decide, up front, which parameter we actually care about. And once we are clear about that, only then do we build an estimator targeted at _that_ estimand, or which nowadays we often call the target parameter. And that's where we make assumptions -- not calculations, but _assumptions_. That's where we separate "identification" from estimation. 

We target a clearly defined parameter under clearly expressible assumptions which Pedro is fond of calling the "forward-engineered" estimator. 

And it's not just Pedro; you can see it in Mogstadt and Torgovitsky's handbook chapter on labor economics (here is the NBER of that) which reviews instrumental variables with "unbounded heterogenous treatment effects", which has been a recurring theme in the new diff-in-diff literature and which carries over to this paper, not surprisingly, as well. 

If you want the average level effect among the treated, we will see as I progress that there is a binarized DiD that delivers exactly that to us. And if you want instead the local slope at each dose, there CBS will give us what are called "sieve" estimators that hit that directly.

Clearly defined, population-level, estimands with identification assumptions supporting estimators designed to give it to us. 

I'll work through those later, though, in a future post. The reframe I wanted to do now was simply to note that the four rows of Table 1 are not four answers. They are not four answers. Rather, the are four attempts to _interpret_ the single number OLS gave us with one per candidate question you might have been asking. The decomposition isn't the estimator. It's more like the diagnosis that helps us better understand what OLS gave us.

Thanks for reading Scott's Mixtape Substack! This post is public so feel free to share it.

Share

You're currently a free subscriber to Scott's Mixtape Substack. For the full experience, upgrade your subscription.

Upgrade to paid

   
---  
| | | Like  
---  
| | Comment  
---  
| | Restack  
---  
   
  
(C) 2026 scott cunningham  
910 North 17th Street, Waco, Texas 76707   
Unsubscribe