For a project I'm doing, I want to use a synthetic control estimator that is robust to noise and missing data. Recent papers have argued that one way of doing this is via principal component analysis. In fact, they call it principal component regression, which ostensibly denoises and debiases the outcomes matrix in the pre-intervention period while imputing the potential outcome.
Well, I know there's no Stata command to do this (or if so, I'd almost pay to find out about it). So I looked at similar papers which did a similar thing, imputing counterfactuals via PCA. After contacting the authors, I tried out their the lead author kindly sent me, I tried their code on the Smoking Dataset for Proposition 99, the canonical SCM dataset installed with the user written Synth package for Stata. I also used it for the original SCM paper, the Basque Country and terrorism paper. Below is my code adapted from Li. et.al.
Okay so I must be honest: I've never heard of PCR before. I've never used it (or seen ti used) empirical work until quite recently. I'd never really needed to use PCA either, but I'm familiar with what it does.
My question is essentially for anyone who's more familiar with PCA/PCR than I am: is this pretty much all there is to PCA in this context? I ask, because the pre-intervention fit is nowhere near as good as classic SCM, but the counterfactual predictions are actually about the same. Might there be a better way to estimate causal effects using PCA/PCR in this instance? It approximates the SCM estimator well, I just want to ensure I'm estimating the method correctly since there's no Stata syntax for this method.
Well, I know there's no Stata command to do this (or if so, I'd almost pay to find out about it). So I looked at similar papers which did a similar thing, imputing counterfactuals via PCA. After contacting the authors, I tried out their the lead author kindly sent me, I tried their code on the Smoking Dataset for Proposition 99, the canonical SCM dataset installed with the user written Synth package for Stata. I also used it for the original SCM paper, the Basque Country and terrorism paper. Below is my code adapted from Li. et.al.
Code:
import delim "https://raw.githubusercontent.com/synth-inference/synthdid/master/data/california_prop99.csv", clear /* Robust SCM/PCR doesn't need really covariates to approximate the counterfactual produced by classic SCM. This one only includes the outcomes. Here, California, state with FIPS code 3, is the treated unit. Treatment is after 1989 */ egen id = group(state) // makes a unique ID xtset id year, y // We now have yearly panel data drop state treated // irrelevant for our purposes rename packs sale // I didn't like the original variable name reshape wide sale, j(id) i( year) // I use greshape, but feel free to use this one too loc stub sale cls *Gets Principal Components, here's the code from Li et. al. pca `stub'1-`stub'3 `stub'4-`stub'39 egen `stub'_d = rowmean(`stub'1-`stub'39) qui gen `stub'_dd = `stub'3-`stub'_d qui sum `stub'_dd if year < 1988, mean dis "pre-1989 average difference is " r(mean) qui g `stub'_da = `stub'_d + r(mean) cls tw (line `stub'3 year, lcol(black)) (line `stub'_da year, lcol(red)), /// xli(1989) /// legend(off) /// text(100 1980 "Observed California", color(black)) /// text(90 1980 "Synthetic California", color(red))
My question is essentially for anyone who's more familiar with PCA/PCR than I am: is this pretty much all there is to PCA in this context? I ask, because the pre-intervention fit is nowhere near as good as classic SCM, but the counterfactual predictions are actually about the same. Might there be a better way to estimate causal effects using PCA/PCR in this instance? It approximates the SCM estimator well, I just want to ensure I'm estimating the method correctly since there's no Stata syntax for this method.
Comment