Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • sdid - Using a proportion outcome variable in synthetic difference-in-differences analysis

    Hi all,

    I am using the sdid package developed by Daniel PV Damian Clarke to run a synthetic difference-in-differences analysis (as in Arkhangelsky et al., 2021). The tool is fantastic and I would be very grateful for the community's guidance on methods question I had.

    For context, my analysis is a balanced panel of subnational regions over roughly two decades. The outcome variable is the annual unemployment rate in each subnational region. A quarter of these subnational regions receive a treatment policy in a staggered fashion.

    However, I am concerned that unemployment rate might be an unsuitable outcome variable to use for sdid analysis because it is bounded between 0% and 100%. I believe it may follow a binomial distribution but I am unsure whether this means it should not be used with sdid.

    Please may I ask for the community's guidance, if you have a few moments to spare, on the following methods questions I am grappling with:

    1. Would unemployment rate (or log unemployment rate) be a statistically defensible outcome variable to use in sdid?
    2. My instinct is that it may it would be better to instead conduct a binomial / logistic regression, but how can I do this in the context of a synthetic difference-in-differences analysis?
    3. Alternatively, would it be better to run sdid with the log unemployment population count instead, using the log total (economically active) population count as a covariate?

    Thank you for your help and apologies if my questions are non-sensical, I am still learning my way around 2WFE/DiD/SC models and their assumptions.

    Jason

  • #2
    probably sensible to use it. DID doesn't play well with non-linear models (see Lechner's paper). but unemployment can be small, which is a problem with LPM.

    #3 may be sensible as an alternative.

    Comment


    • #3
      Originally posted by George Ford View Post
      probably sensible to use it. DID doesn't play well with non-linear models (see Lechner's paper). but unemployment can be small, which is a problem with LPM.

      #3 may be sensible as an alternative.
      George Ford Thank you so much for your response George, I really appreciate it.

      Please may I check what you are referring to when you say "probably sensible to use it"? Did you mean it is sensible to use the unemployment rate with SDID, or were you referring to binomial / logistic regression?

      Comment


      • #4
        yup. just use unemployment rate for starters.

        Comment


        • #5
          Originally posted by George Ford View Post
          yup. just use unemployment rate for starters.
          George Ford Thank you so much once again for your guidance. I have given this a go. My SDID results have yielded a significant ATT between my policy treatment and the unemployment rate. I will conduct a robustness check using log unemployment count too, and perhaps with employment rate to see if the relationship holds in the opposite direction. I will also try to better understand how covariates work within the SDID model, and see if I can include some (eg on skills, population age demographics, GDP per capita, etc.). If you have any further suggestions for tests, covariates or any other pitfalls I should be wary of, I would be very grateful to hear them and test them out! Thanks again for your help. This is fun!

          Comment


          • #6
            You may run into trouble if the u-rate is small.

            I'd look at the literature as well. Many papers have been written with u-rate as the DV, probably SDID type models too.

            Comment


            • #7
              Originally posted by George Ford View Post
              You may run into trouble if the u-rate is small.

              I'd look at the literature as well. Many papers have been written with u-rate as the DV, probably SDID type models too.
              Thanks George Ford, I'll check the literature out.

              Comment


              • #8
                Hi sdid community,

                I had a couple more questions - in addition to the 3 questions in my first post above - which I would be very grateful for your feedback on. Please let me know if you have any ideas.

                4. In traditional difference-in-differences, the donor pool is carefully selected such that the control units are similar to treatment units. This intuitively feels like less of a concern in sdid because the control units are weighted to make their pre-treatment outcome trend as parallel as possible with treated units. I have therefore included all subnational regions which did not receive treatment within the donor pool. Is this the right thing to do -- or should I instead carefully select which subnational regions to include in my control donor pool, based on similarity with the treated subnational regions?

                5. What are appropriate robustness tests to conduct on my sdid outputs, and how can I conduct them?

                Comment


                • #9
                  In traditional difference-in-differences, the donor pool is carefully selected such that the control units are similar to treatment units.
                  I think you are referring to matching. In traditional DiD conducted with two way fixed effects, the variation that is used is within units over time. Through the presence of unit fixed effects, treatment units are not directly compared to control units.

                  Precisely, as you state, in sdid, there are unit and period weights. The presence of a constant in the matching process also allows a systematic level difference between treated and control.

                  If you were to select (pick and choose) on a non-random basis which units to include in the controls, you've got selection bias straight off the bat.

                  sdid (user-written command) is a great tool, however there is one caveat, at least that I know of.

                  sdid matches on pre-treatment outcomes, which can add bias to the estimation. There is a burgeoning literature on this. If you just search for "matching on pre-treatment outcomes bias" you will find quite a few papers.

                  Comment


                  • #10
                    You might to check this out.

                    HTML Code:
                    https://economics.princeton.edu/wp-content/uploads/2021/08/two_way_mundlak-Wooldridge.pdf
                    It can give you basically the same as sdid but using standard regression.

                    Comment


                    • #11

                      Originally posted by Maxence Morlet View Post

                      I think you are referring to matching. In traditional DiD conducted with two way fixed effects, the variation that is used is within units over time. Through the presence of unit fixed effects, treatment units are not directly compared to control units.

                      Precisely, as you state, in sdid, there are unit and period weights. The presence of a constant in the matching process also allows a systematic level difference between treated and control.

                      If you were to select (pick and choose) on a non-random basis which units to include in the controls, you've got selection bias straight off the bat.

                      sdid (user-written command) is a great tool, however there is one caveat, at least that I know of.

                      sdid matches on pre-treatment outcomes, which can add bias to the estimation. There is a burgeoning literature on this. If you just search for "matching on pre-treatment outcomes bias" you will find quite a few papers.
                      Maxence Morlet Thank you, that makes sense! I will use the full set of non-treated subnational regions in my control/donor pool to avoid a selection bias, and let SDID do the work to assign appropriate weights. Thank you as well on the "matching on pre-treatment outcomes bias", this looks very helpful to consider as part of my limitations.

                      Comment


                      • #12
                        Originally posted by George Ford View Post
                        You might to check this out.

                        HTML Code:
                        https://economics.princeton.edu/wp-content/uploads/2021/08/two_way_mundlak-Wooldridge.pdf
                        It can give you basically the same as sdid but using standard regression.
                        Awesome - I had not come across this before. Thank you George Ford

                        Comment


                        • #13
                          Does anyone know how to create descriptive statistics for the synthetic control group in sdid after running the analysis? I know that the outcome trend (i.e. unemployment rate trend) for the control group and treatment group can be compared using e(series) and e(difference), and that unit weights can be produced with g1on. But is there a way I can describe pre- and post-treatment trends for other variables in my dataset (eg GDP, age) within the control group vs. the treatment group? If there is not an automatic way to do this, would I be able to somehow construct it using the time/unit weights for both groups?

                          Comment


                          • #14
                            Hello Jason, in response to the first point above, I would say on your three questions that:
                            1. It is perfectly fine to use a bounded variable such as a rate with SDID. As SDID will seek to match pre-trends, there will be no sort of problematic extrapolations or things that imply that there will be issues with generating the synthetic counterfactual. In general, in the synthetic control literature, work with rates has been conducted successfully (see for example the original work on cigarette consumption per capita following California’s Proposition 99 (also included as an example in the sdid help file), where the variable is bounded below by 0. So you should be good to go on this front, and could cite this work as a precedent if you wish.
                            2. In terms of the variable distribution, I would say this is not an issue with SDID. Again, in terms of estimation sdid will simply seek to generate parallel pre-trends and then implement a diff-in-diff style procedure. Where you may be concerned about distributional assumptions is in the inference, but in the case of sdid, all of the theory developed is asymptotic, so standard central limit theorem’s make me think that the distribution of your outcome is not an issue in this case either (though I’d be careful if working with a small number of units).
                            3. I think you are probably fine with your current setting. In general, I’d say that your outcome should really depend on the specific context of interest, so am remiss to say too much without having deep knowledge of your setting, but I think the setting you describe above is perfectly valid for sdid. One thing I’d note if you do opt for this second route is that in sdid, controls are simply concentrated out, and in certain cases this is done using just non-treated units (please see sectio 2.2. here for more information: https://www.damianclarke.net/research/papers/SDID.pdf), so you may want to be careful with controls, especially if populations are changing substantially over time. On balance, I’d say your current set-up is good.

                            Comment


                            • #15
                              Originally posted by Jason Acomb View Post
                              Does anyone know how to create descriptive statistics for the synthetic control group in sdid after running the analysis? I know that the outcome trend (i.e. unemployment rate trend) for the control group and treatment group can be compared using e(series) and e(difference), and that unit weights can be produced with g1on. But is there a way I can describe pre- and post-treatment trends for other variables in my dataset (eg GDP, age) within the control group vs. the treatment group? If there is not an automatic way to do this, would I be able to somehow construct it using the time/unit weights for both groups?
                              In terms of generating trends for these variables, what I would do would be to take the omega weights which are returned in e(omega). To simplify ideas, say that you have 5 control states, and e(omega) suggests state 1 is 20%, state 2 and 3 are 0%, state 4 is 50% and state 5 is 30%. Then, for any particular covariate you can generate the synthetic control at each time period by generating the variable Xit where i refers to state and t refers to time and doing the following:

                              Xitsdid = 0.2*X1t, + 0*X2t + 0*X3t + 0.5*X4t + 0.3*X5t

                              Here I am just calling your SDID aggregate Xitsdid, and the values for each control state X1t (state 1), X2t (state 2), and so forth. So basically all you need to do is grab the state weights that are returned, and use these to generate a single aggregate. You can then plot this variable for each time period, and compare it to values for X for the treated unit. Of course the nature of Synthetic Difference-in-Differences won't guarantee that covariates follow parallel trends as synthetic DID matches pre-trends on the dependent variable of interest, but if you see covariates moving about a lot over time, this may give you some reason to worry that there are systematic changes not captured by the synthetic control.

                              Comment

                              Working...
                              X