Identification in Synthetic Control

The Role of the Factor Model  
  
͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   ͏   

| |   
---|---|---  
| | | Forwarded this email? Subscribe here for more  
---  
  
# Identification in Synthetic Control 

### The Role of the Factor Model

| | scott cunningham  
---  
| Dec 12  
---  
|   
---  
   
---  
| | |   
---  
| |   
---  
| |   
---  
| |   
---  
| | READ IN APP  
---  
   
  
The identifying assumption in synthetic control that was presented in Abadie, Diamond and Hainmueller (2010) was that of a "factor model". I don't know the history of the factor models, so I won't even attempt, but they look like this.

where Y(0) is the untreated potential outcome (e.g., earnings without some intervention), _\alpha_ is a constant, _\beta_ is a time-varying macro-level shock that causes the "returns" to _X_ , some unit-specific observable variable (e.g., firm size) to change over time, _\mu_ is some unit-specific **unobservable** variable (e.g., CEO ability) with _\mu_ also a time-varying macro-level shock that causes their ability's impact on earnings to vary, and some white noise error term _, \varepsilon_ , that on average is zero.

Let me comment on a few things now. First, the synthetic control model originally anyway was more rooted in the "outcome model" tradition of identification in that ADH said something along the lines of that if you assumed that all of the untreated potential outcomes followed a factor model, then they could have a conversation with you about the bias of synthetic control. Put aside for now any updates to this, as Imbens and others have examined the performance of synthetic control under other situations. For now, just note that.

I call it the outcome model tradition because that is sort of how my brain works to characterize the different ways of approaching identification in the more contemporary causal inference paradigm. The slang that I hear contrasted to it is one of a randomized treatment assignment, or design-based approach, where one does not specify that the untreated potential outcome is characterized any one way, but rather focuses on the manner in which units get assigned to treatment or not (i.e., are they randomized). But synthetic control fits more with difference-in-differences in that an explicit claim is made, or at least entertained, that the missing potential outcome follows some stated process which ADH then use to characterize the bias. 

I hesitate to say what I say in this post for risk I am wrong, but I just wanted to share how I understand this with the caveat that I could be wrong, so buyer beware, but I guess what I'm trying to say is that there are characterizations of potential outcomes and there are characterizations of treatment assignment processes, and that synthetic control originally was more in line with the outcome model identification approach. And so is diff-in-diff.

**Design-based approaches to identification**

But that is where they differ. The diff-in-diff identification assumption -- the crucial one anyway -- does not need randomization. Sure, randomization of the treatment will give you parallel trends. But randomization of the treatment will also give you equivalent mean potential outcomes too. Let's say that the treatment, _D_ , was independent of Y(0). Then you can write this down:

I used to find it hard to read this left to right for some reason. I'm not exactly sure what would trip me and I still don't. I found over time that the only thing that has really helped me make progress in causal inference has been pencil and paper, writing stuff out, making the deductions myself, and just working through proofs by hand. That is the only thing that really helps me for small things and for larger things, and perhaps an econometrician would say "well of course", but I guess what I'm saying is that even a line like that should be straightforward, so in the off chance it is not straightforward, let me explain.

Independence implies that the treatment is assigned to units "for reasons" that have absolutely nothing to do with that one particular variable called "the untreated potential outcome, Y(0)". As such, the mean of that variable is the same for both groups, treatment and control. The first is the mean of potential outcome for the treated equal to the second which is the mean of the potential outcome for the control to the third which is the mean for the entire sample. Here, let me show you using python.
    
    
    import numpy as np

    import matplotlib.pyplot as plt

    

    # Set seed for reproducibility

    np.random.seed(42)

    

    # Number of observations

    n = 5000

    

    # Treatment assignment (independent of y0)

    d = np.random.binomial(1, 0.5, size=n)

    

    # Funky y0 process (mean ≈ 15, sd ≈ 1)

    y0 = (

        15

        + np.random.normal(0, 0.8, size=n)

        + 0.3 * np.sin(np.random.uniform(0, 2 * np.pi, size=n))

    )

    

    # Compute means

    mean_y0_treated = y0[d == 1].mean()

    mean_y0_control = y0[d == 0].mean()

    mean_y0_overall = y0.mean()

    

    # Plot

    labels = ["E[y0 | d=1]", "E[y0 | d=0]", "E[y0]"]

    values = [mean_y0_treated, mean_y0_control, mean_y0_overall]

    

    plt.figure()

    plt.bar(labels, values)

    plt.axhline(mean_y0_overall, linestyle="--")

    plt.ylabel("Mean of y0")

    plt.title("y0 is independent of treatment")

    plt.show()

| |   
---|---|---  
  
See how the means don't vary? That the mean in the treatment group and the mean in the control group are the same as the mean in the whole sample? That's the sample analog to the population representation of Y(0) _||_ D. 

The advantage of the design approach to causal inference is that those deductions are allowed -- even when one of them _does not exist_. Can you see which of these can be calculated and which of these cannot be calculated? The middle one -- E[y0|d=0] -- is simply once we use the switching equation E[y|d=0], where the switching equation is:

And that's why you can calculate the middle one. You can calculate the middle one because for the control group, y(0)=y. So it is in the data proper, and therefore if you ever need it, you got it. But the first quantity -- E[y(0)|d=1] -- does not exist because when a unit is treated, we only observe y(1) by the switching equation. Which means we kill off y(0) and thus it does not exist in the data and will never exist. It is lost to the ether. It is a ghost gone to live with its ancestors in that place where potential outcomes go when they die before having a chance to live a life.

But here's the magic. If the treatment is independent of y(0), then even though E[y(0)|d=1] is gone forever, you still have it. You have it because it's equal to E[y(0)|d=0], so anytime along this long and windy road you happen to _need_ E[y(0)|d=1], you have it. It's just your handy E[y(0)|d=0]. As the wizard of Oz tells Dorothy -- you were always home. 

**Synth Identification Under Random Assignment and a Factor Model**

So then what is going on with the ADH factor model inside of synthetic control is _not_ traditionally a statement about the _treatment assignment_. I mean, it is and it isn't. If the treatment is truly random, then the expected factor models would be the same. If the treatment is random, then _in expectation_ at the population level not only would the y(0) be the same for treatment and control groups, but _so would the underlying structure_ on the right-hand side. I made another python script with random treatment assignment so you can see it, and I "jiggled" the lines so that they'd be separated slightly apart on the graph. And then I ran a Monte Carlo simulation where I repeated the generating of the data by the factor model processes 1,000 times, plotted the mean line, and plotted the standard deviation multiplied by 1.96 so you could see the 95% confidence intervals too. I also made it so that the y(0) is killed off post-treatment for the treatment group but not the control group. Note, I only jiggled it so you'd see it on the same graph, but in reality in expectation the population would be the same on average.

| |   
---|---|---  
  
With the code in python again here:
    
    
    import numpy as np

    import matplotlib.pyplot as plt

    

    np.random.seed(123)

    

    # Parameters

    n_units = 500

    T_pre = 40

    T_post = 10

    T = T_pre + T_post

    years = np.arange(-T_pre, T_post)

    n_mc = 1000

    

    alpha = 10

    sigma_eps = 2

    

    treated_paths = np.zeros((n_mc, T))

    control_paths = np.zeros((n_mc, T))

    

    for m in range(n_mc):

        X = np.random.normal(0, 1, n_units)

        mu = np.random.normal(0, 1, n_units)

        D = np.random.binomial(1, 0.5, n_units)

    

        beta_t = 0.2 + 0.005 * years

        omega_t = 0.4 + 0.02 * np.cos(years / 6)

    

        Y0 = np.zeros((n_units, T))

        for t in range(T):

            eps = np.random.normal(0, sigma_eps, n_units)

            Y0[:, t] = alpha + beta_t[t] * X + omega_t[t] * mu + eps

    

        Y0_obs = Y0.copy()

        post_idx = years >= 0

        Y0_obs[D == 1, :][:, post_idx] = np.nan

    

        treated_paths[m, :] = np.nanmean(Y0_obs[D == 1], axis=0)

        control_paths[m, :] = Y0_obs[D == 0].mean(axis=0)

    

    treated_mean = np.nanmean(treated_paths, axis=0)

    control_mean = control_paths.mean(axis=0)

    

    treated_sd = np.nanstd(treated_paths, axis=0)

    control_sd = control_paths.std(axis=0)

    

    # small visual offset

    treated_mean = treated_mean + 0.03

    control_mean = control_mean - 0.03

    

    plt.figure(figsize=(10, 5))

    

    # Pre-treatment solid, post-treatment dashed for treated Y(0)

    pre = years < 0

    post = years >= 0

    

    plt.plot(years[pre], treated_mean[pre], label="Treated: E[Y(0)|D=1]", linewidth=2)

    plt.plot(years[post], treated_mean[post], linestyle="--", linewidth=2)

    

    plt.plot(years, control_mean, label="Control: E[Y(0)|D=0]", linewidth=2)

    

    plt.fill_between(

        years,

        treated_mean - 1.96 * treated_sd,

        treated_mean + 1.96 * treated_sd,

        alpha=0.2,

    )

    

    plt.fill_between(

        years,

        control_mean - 1.96 * control_sd,

        control_mean + 1.96 * control_sd,

        alpha=0.2,

    )

    

    plt.axvline(0, linestyle="--")

    plt.ylim(9, 11)

    plt.xlabel("Time")

    plt.ylabel("Average untreated outcome Y(0)")

    plt.title(

        "Y(0) Under Exogenous Treatment\nSolid = Observed, Dashed = Counterfactual (Unobserved)"

    )

    plt.legend()

    plt.show()

Skiping over _how_ synthetic control does its calculations, note that what it becomes is this:  


where the first term is the y(0) for the single treated unit, and the second is the synthetic control where the weights are on the control group, _w_j_. Under randomization, those two quantities are the same. But if they're the same, then it implies that the RHS would also be the same:

And so a few things happen. First, the constant, _\alpha_ , is deleted. Second, the observables are similar on average so that first term gets deleted _regardless of \beta_ , and on down to the right. And in this case, you could use _any_ set of weights because the average of any weighted group will be on average equal to that of the treated unit's own time path pre-treatment. At least in the frequentist sense where the units are random draws from some super population.

But here's the thing. Under random assignment, we don't _need_ to care about the factor model itself. We don't even need to make the assumption that it _did_ follow that process. And we don't need synthetic control because under randomized treatment assignment, we just always can deduce that the 

**Synth Identification Under Non-Randomized Treatment Assignment**

So that's kind of the design approach to identification. Everything automatically cancels out under whatever central limit theorem we invoke. But what if don't have random assignment? Well, if we don't have random assignment, then we cannot just assume that the means are independent of treatment assignment. Which means then that those weights actually will matter a great deal. First, let me write down the factor model again:

And now let me write down the bias of the synthetic control (again, skipping where it came from) as the difference between the treated unit (unit 1) and the weighted average of the _J_ control group units:

And when I write that down noting the left hand then the right hand side _also_ gets differenced and it's here:

Well, synthetic control is all about "matching well", and here imagine it was estimated by matching on the observable _X_. You'll often use lagged outcomes for that, but for now just be general about it and just note that we have some process where _X_ causes _Y(0)_. So then let's assume you did "match well" on the observable. That would then mean this:

This causes the first term, which note is a residual of sorts, to be approximately zero:

And I deleted it because we can actually check that. But then that leaves us with two more terms, neither of which we can check, as _\mu_ is not observable (recall it is the CEO's unobserved ability) and _\varepsilon_ are those random shocks. And that's more or less the bias. The bias of synthetic control of ADH is the matching bias associated with _\mu_ and differential but what they call "transitory shocks" such that there's also a residual gap there. And that's basically the bias more or less. 

Basically, ADH explicitly say that if you could match both observed covariates, _X_ , and the unobserved factor "loadings" which are those time-varying processes on it, _\beta_ , you'd be unbiased so long the time-varying loadings on your unobserved _\mu**also matched** , _as well as the residuals on those transitory shocks. 

Well this is where you start to realize what synth is doing and how it is not as simple as the plug-and-play you get from independence. You can't assume those are the same in any weighted average in the process. The weights, rather, matter, but it's not just that. It must be the case that control group units exist in the data that follow the same process and which when weighted are in fact similar. Synth "finds" them observationally by matching on _X_ , preferably the lagged _Y_ outcomes though since the best predictor of future _Y(0)_ is most likely the past, _even when that Y(0) is killed off_.

But back to my point. Since _\mu_ isn't observed, this is where the ADH argument comes in: **a long pre-period + good pre-fit of outcomes** makes it hard to get that fit _**unless \mu**_ is also approximately matched--otherwise you'd be relying on _\varepsilon_ "just so" to compensate (overfitting). This is where you'll sometimes hear a phrase when people discuss synthetic control -- are you matching on the "structure" (meaning the loadings and the unobserved heterogeneity, _\mu_) or are you "matching on noise"? 

In a short-panel, you'd only be safe if the transitory shocks are more or less approaching zero, which is not knowable, so long as there did exist "good matches" in the data (i.e., for which Xs are matched well). But you can't be sure and that's why, in the _Journal of Economic Literature_primer on synth by Abadie (2021), the practical "bias control" discussion is framed around the "scale of the transitory shocks" and the length of the pre-period (usually said with notation _T_0_). 

And that's it. That's the identification. It's the factor model driving it. 

**Conclusion**

So, how does this compare with diff-in-diff? Well, interestingly, they're different and similar with respect to their identifying assumptions to a degree. Technically, parallel trends does not "need" the long pre-treatment. Diff-in-diff _uses_ the pre-treatment coefficients in the 2xT event studies to _argue_ that parallel trends in the post-treatment is a reasonable belief to hold, but you can identify the ATT with a diff-in-diff with a simple 2x2 (i.e., only one pre-treatment period). You may not find it convincing without seeing the event study, but that's not identification. That's _falsification_. 

So then, does synthetic control "need" the long pre-treatment? Well, sort of. I mean, you could use synthetic control with a one period. You'd solve that constrained distance minimization function, get your non-negative weights, and then run them forward to get dynamic estimated treatment effects. So in that sense, no you don't "need" the long series. 

My reading is different. My reading is that it's similar in that I think under the factor model, we can gain more confidence that we have a weighted average with those same two residual terms (on the heterogeneity and on the transitory shocks). So in a sense, I think it's probably got a similar function as the 2xT event study, though I'm sure others may disagree, and frankly I think I changed my opinion on this in the course of this substack and after I take my shower, I may switch back, but my point is that you're assuming a factor model of Y(0), you're _hoping and praying_ that a couple of units at least in the donor pool are similar enough _on that underlying structure_ that synthetic control can find them with that distance minimization formula, and you're more likely to do that with the long than the short pre-treatment time series due to that idiosyncratic error in there that is also contributing to the bias.

**Alberto Abadie, Basque, and History of Thought Quick Detour**

Anyway, that's my understanding of identification with the synthetic control model. All errors are my own. Thanks again to my favorite econometrician, Alberto Abadie, for his contributions to causal inference going all the way back to being Josh Angrist's student at MIT -- one of his first, a classmate in Josh's second cohort with such luminaries as Sue Dynarski, Jonah Gelbach, Marianne Bitler, and Esther Duflo. I think there is a sixth, and I seem to remember that Jeff Kling maybe was Josh's first student (a singleton) who graduated before that second cohort, but I am drawing a blank on that person's name. It's buried in my notes somewhere. 

But anyway, my point is, Alberto came out swinging, making original contributions to pretty much whatever he touched whether it was instrumental variables (that's where we get kappa weights which turned out to be influential in the judge designs because of the ability to get complier characteristics, which you will now more and more seen done, probably in part thanks to Peter Hull (another one of my favorite econometricians, and also a Josh student) pushing for it, and promoting that earlier Abadie work. Then Alberto goes on a tear working with Guido Imbens on matching related topics. Pretty much immediately, Alberto goes to Basque Country at the invitation of Javier Gardeazabal to do a workshop where the two of them talked about doing a project together about the effect of the ETA terrorist group on aggregate income. And that's where Alberto cooked up synthetic control, and in a podcast when I asked him about it, like _why_ did he come up with it, the first thing he told me was because he loved his home in the Basque Country. A comment I still think about to this day -- what a nice thought that love led to one of the most important innovations in causal inference of the last 25 years.

Basque Country -- that romantic, beautiful region in Northern Spain. An autonomous region I have now visited two summers in a row. I went up to a vendor near the beach once and when they asked me why I had come to visit beautiful San Sebastian (as if someone needs a reason to visit San Sebastian), I told them this was where my favorite econometrician was from. And then jokingly I told them "and this was where synthetic control was born". Since I don't speak Spanish, but rather I speak the thick dialect of deep Mississippi, I'm sure that made as much sense to them as anything else I said. 

Anyway, I woke up this morning wanting to write all this down. My semester is winding down. It's been the hardest semester of my entire life. There is in fact no synthetic control I could create for it if I drew from every other previous semester in my own 25 years of teaching. I mean, I _could_ do it. You can always get a weighted average of control group units that minimize some distance function that minimizes the squared sum of matching discrepancies. I just would be I the outlier and _way, way_ off the convex hull. But that is for another substack for another day, maybe another year, an old man.

Happy holidays. And remember, don't forget to put Sufjan Stevens Christmas album on this year, at least once. I actually self medicate my stress levels listening to it, having moved away from hip hop and alternative music, as Cosmos said I needed to do more grounding exercises and literally recommended it to me. 

Scott's Mixtape Substack is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Upgrade to paid

You're currently a free subscriber to Scott's Mixtape Substack. For the full experience, upgrade your subscription.

Upgrade to paid

   
---  
| | | Like  
---  
| | Comment  
---  
| | Restack  
---  
   
  
(C) 2025 scott cunningham  
910 North 17th Street, Waco, Texas 76707   
Unsubscribe