Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • DiD approach

    I am trying to understand the type of data I have to do a DiD analysis. I am evaluating a new policy aimed at decreasing the rate of elderly illness for those undergoing heart surgery in hospitals. The intervention is deployed in all urban region hospitals in the country on the 1st of March 2018. This data is on weekly illness rates for the elderly from March 2016 till March 2019. Then I have a collection of similar data from the rural region hospitals. Again it is weekly data on illness rates for the elderly between March 2016 and March 2019. I want to apply a DiD design to this.

    I am trying to wrap my head around this, does this then mean I have cross-sectional data or panel data? I am trying to relate this to the potential outcomes framework.

    One group that receives treatment which is the elderly in the urban hospitals. Then the control group are not subject to the intervention. This will be the elderly in rural hospitals. So I thought this meant I have repeated cross-section data as different individuals are observed in each group at each point in time. Am I correct? Also would this then be the 2x 2 DiD model?
    Last edited by Taiba Chau; 20 May 2022, 13:16.

  • #2
    Panel data (in my opinion) is
    1. Observing J >1 units
    ​​2. Observing T>1 time periods and
    3. Observing the exact same units across the exact same time periods.
    group that receives treatment which is the elderly in the urban hospitals. Then the control group are not subject to the intervention. This will be the elderly in rural hospitals. So I thought this meant I have repeated cross-section data as different individuals are observed in each group at each point in time.
    Yes, this is what I would call "panel" data, repeated cross section is too much of a mouthful for me, but that's what it is.


    From a potential outcomes standpoint, you have an unbiased causal effect if the average expected trends of the outcomes for the rural hospitals and the average outcomes of the urban hospitals would move in une same direction, IF the intervention did not happen. The missing data problem is the counterfactual, or how the outcomes of the treated units would look after March 1 2018 if the treatment didn't happen

    Comment


    • #3
      There are
      Number of Rural Community Hospitals 1,796
      Number of Urban Community Hospitals 3,343
      in the United States. And in my opinion, not all of these will be good comparisons to each other. Would it make sense to compare hospitals in Puckwana South Dakota to those in LA, NYC, or Detroit? Likely not. So whatever effect size you DO estimate, you'll likely need to break it down by region (i.e., analysis of hospitals in the northeast vs. Southeast). You could also conduct such an analysis by studying them at the state level.

      Comment


      • #4
        Thanks for the thoughts. I thought there was a difference between panel and repeated cross-sectional data. I thought panel data would be the same units observed in each group over time. Whilst repeated cross-section would be observing different units in each group at each point in time.

        Comment


        • #5
          Taiba: You have repeated cross sections. Or, you can aggregate the data at the hospital level to create a panel. I’d probably try both ways.

          Comment


          • #6
            Jeff Wooldridge I'm not understanding the difference between a repeated cross section and a panel dataset. Could you please explain?

            I presume it, as you write in your book, is a little like the idiosyncrasies between model and estimator: when I say "OLS model", you and most folks know I in fact mean "a model estimated via OLS", but these, in technical writing anyways, aren't interchangeable.

            Comment


            • #7
              The data are at the individual level and you don’t (or very rarely) observe the same individuals in the different time period. You effectively observe each individual once. That’s not panel data. It’s pooled or repeated cross sections.

              if the unit of observation is the hospital, then it would be panel data.

              Comment


              • #8
                double post

                Comment


                • #9
                  Oh I see, that makes perfect sense then

                  Comment


                  • #10
                    Thank you that makes a lot of sense! I was wondering with this then would be a 2 x2 model or a model which has 2 groups but multiple time periods based on the data being weekly?
                    Last edited by Taiba Chau; 21 May 2022, 15:50.

                    Comment


                    • #11
                      You should use the multiple time periods. I’ll answer more completely tomorrow.

                      Comment


                      • #12
                        I was also wondering in a regression focused on the firm level, would it be correct or interesting to include other fixed effects, like state or county fixed effects? I have done this within my regression and found some interesting results. But I was trying to understand the intuition behind it

                        Comment


                        • #13
                          You could include them if you want...... but in all honesty, I may have a better idea for you. Honestly if this were my problem, I would use either my synthetic control estimator or my colleague's. When used judiciously, synthetic controls solve SO many problems of difference-in-difference designs and even account for unobserved level confounding.



                          I'm not saying DD isn't useful or that it isn't even an appropriate design in this instance, I'm saying SCM is a higher, more generalized form of DD that would work much better in this circumstance.

                          Comment


                          • #14
                            Thank you for your ideas. I have come across synthetic controls but I was wondering if I were to include the so-called county or state fixed effects, I would no longer have the firm fixed effects.
                            So the regression would then be
                            Click image for larger version

Name:	county.png
Views:	1
Size:	1.4 KB
ID:	1665791

                            So now there is no firm fixed effect. Instead there is a county fixed effect denoted by a subscript c. The outcome if still for the firm level.

                            Comment


                            • #15
                              I was also wondering how to do a coefplot for my regression results for those treated and those not treated. I tried:
                              quietly reg illness b22233.date##treated if inrange(date,td(14nov2020),td(30nov2020)) |(treated==1),robust
                              estimates store Treated

                              quietly reg illness b22233.date##treated if inrange(date,td(14nov2020),td(30nov2020)) |(treated==0),robust
                              estimates store Untreated

                              Then I tried to plots:
                              coefplot Treated Untreated, vertical drop(_cons) xlabel(1 "14 Nov 20" 25 "18 Nov 20" 50 "22 Nov 20" 75 "26 Nov 20" 100 "30 Nov 20", angle(50))

                              and

                              coefplot Treated Untreated, vertical drop(_cons) keep(*.t) xlabel(1 "14 Nov 20" 25 "18 Nov 20" 50 "22 Nov 20" 75 "26 Nov 20" 100 "30 Nov 20", angle(50))

                              Both gave me different results but doesn't seem to be exactly correct
                              Last edited by Taiba Chau; 22 May 2022, 13:43.

                              Comment

                              Working...
                              X