-sdid- with long data and few treated units

Jeff Wooldridge

Join Date: Apr 2014

Posts: 2202
#16

26 Jul 2023, 08:37

Just took a quick look at Noah's data. I think he has 137 potential donors from Oklahoma alone. I bet that's enough to do whatever he wants to do with two treated units. Frankly, I can't imagine a seminar participant or reviewer saying, "But did you try all of the police departments in Vermont, too?" How can we even know that over this long period across the United States there weren't policies implemented that could affect youth arrests?
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#17

26 Jul 2023, 09:41

Having lots of donors doesn't suddenly make the asymptotics work better -- likely the opposite, as I think you suggested above.

You're right, that is what I'm saying. I said it privately to Max, but the standard advice to avoid things like interpolation bias is to trim your donors. There's plenty of ways we could go about this (even ones that don't use formalized machine learning algorithms).

I prefer the ML approach, because (when done correctly or in a justifiable manner), it means we don't have to have the discussion as much about which donors to include or not include. So, If we have 4000 donors, as we seem to have here, I would agree with you that some sparse subset of them are comparable. Additionally, the reason why I prefer the ML approach is because it ensures robustness. For example, I ran the OKC data in the dataset above. I included all 4000 donors. Robust PCA-SYNTH gave me an ATT of around 27%.

But then, I limited the donors to units that never = 0 (since this would likely violate common support). This returned the following dataframe.

Code:

* Example generated by -dataex-. For more info, type help dataex clear input int real float cf int dates float gap byte(ntime rel) 4025 4161.398 2000 -136.39778 0 -16 3924 3821.259 2001 102.74133 1 -15 4019 3940.0496 2002 78.95056 2 -14 4172 4439.044 2003 -267.04404 3 -13 5064 4869.5723 2004 194.42796 4 -12 4539 4674.1763 2005 -135.17636 5 -11 4477 4251.33 2006 225.66985 6 -10 3719 4101.5938 2007 -382.5937 7 -9 4441 4209.5586 2008 231.44165 8 -8 4500 4482.7314 2009 17.268713 9 -7 4043 4140.0576 2010 -97.05772 10 -6 3413 3667.7256 2011 -254.7256 11 -5 3691 3342.3794 2012 348.6205 12 -4 3255 3290.8545 2013 -35.85451 13 -3 3131 3336.088 2014 -205.0882 14 -2 3612 3510.215 2015 101.78505 15 -1 3132 3640.425 2016 -508.4253 16 0 2813 3827.1545 2017 -1014.1546 17 1 2550 3503.368 2018 -953.3681 18 2 1756 3243.284 2019 -1487.284 19 3 end

This is our result, when I limit the donors (in what I think is a more reasonable way, since Oklahoma City will obviously never have 0 suspensions a year). The ATT here is also 27%. To me, this is great, because even though though we've altered the donor pool form the original one, we still get the same result. The donors which get weight will obviously change, by virtue of us dropping lots of outliers, but the RPCA approach is very robust to noise and outliers, breaking up the observed outcome matrix into a low-rank and sparse component. I didn't run vanilla SCM with this, because the quadratic programming would finish when I get my PHD, but I'm pretty sure vanilla SCM would not have sparse weights, and would likely reach different conclusions if we used all of the donors. So to me, while it's imperfect, having a denoising step which simultaneously selects a donor pool is quite useful in this instance.

Additionally, in many real applications, it's not possible to have additional predictor information that we oftentimes use normal SCM with. So, since RPCA-SYNTH gets the exact same effect size independent of the donors use use (when we use the full list, anyways), I think formalized ways to get a donor pool and then estimate your effects is worthy of further use in policy/metrics applications. I should also note the results are robust to me changing the number of singular values used to do the clustering over the fpca scores- USVT selected 13 singular values, but even when I change this number to 7, we get the same results.

So sort of like I said at first, I think there needs to be a way to get a smaller donor pool. In our opinion (speaking for myself and my coauthors), we think that the way the we do this should be as least subject to human biases a possible beyond the highest level decisions, and that ML offers ways to do this. To me, clustering + PCA and even tensor techniques would be a great way to do this this for a policy setting, and this is mainly what my work as a PHD student currently focuses on.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2202
#18

30 Jul 2023, 06:05

Jared: Is there a theory of inference after using PCA or ML to obtain the donor pool? I'm a bit concerned that this pre-processing might not be reflected in the final estimates of uncertainty. People often match in a preliminary step and then ignore the fact that they've done that when reporting standard errors and confidence intervals.

BTW, I don't think you want to select your sample based on the outcome, if that's what you mean by using units that never equal zero.
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#19

30 Jul 2023, 07:09

Nope as far as I know, no formal inferential theory exists yet. The main contribution of RPCA-SYNTH (in my eyes, as well as in Mani, the author's eyes) is RPCA's robustness to in space and in time placebos (as well as, in this case, different donor pools).

Mani developed the method for his dissertation, but he didn't do any asymptotics (partly because as far as I know, the kmeans clustering step, the method that actually selects the donor pool, is hard to study asymptotically, as far as I'm aware). Presumably, the folks over at Journal of Causal Inference (where the paper is currently under review) may want him to demonstrate the asymptotics for publication.

Someday, beyond that, I'd want to formally study its asymptotic properties, since no formalized inferential theory exists at present. Presumably we could calculate confidence or prediction intervals, since the estimation method we use is just OLS with positivity constraints, but I honestly haven't thought about that much yet.

Actually, inference in SCM is still a quickly growing topic. Scholars like Kathy Li at Texas McCombs, and a few folks over at Columbia and elsewhere of course do really good work on inference specifically. As far as I see, only very recently (last 5 years, say) have other statisticians seriously delved into SCM's asymptotic properties (including on basic things like the weights, the general asymptotic properties, and other aspects).

So, I guess what I'm saying is, I share your concerns about uncertainty, and that the method I advocate for isn't "the answer", but hopefully the beginning of one. Moreover, I think that machine learning methods should be justified to address certain problems, and not just be fancy for the sake of it, in other words. Jeff Wooldridge
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2202
#20

30 Jul 2023, 08:02

I'm confident progress will be made on inference, but sometimes it requires strong assumptions. In the Arkhangelsky et al. paper, they assume normality. I suspect this isn't needed, but one wonders if there's something about the problem that makes application of a typical central limit theorem difficult.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment