Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Question regarding diff in diff

    Hello everyone, I have a problem regarding a study I'm trying to do.

    I have a panel data on 158 different neighborhoods, from january 2006 to december 2016 (132 months in total), I aim to show the effect of the creation of primary medical facilities in some (not all, around 50%) of the neighborhoods on the death rate by diabetes mellitus of these neighborhoods, the medical facilities started to be constructed/implemented in 2009, but it's an ongoing process, meaning that as I type this more are being created, I thought of doing a diff in diff approach to separate the treated group from the non treated group, but my main problem is that the implementation was gradual, meaning that some facilities were created in february 2009, some in december, some in june 2010 and so on... so I don't know if it would be possible to make a "dynamic diff in diff" like this, is it?

    Also, I tried doing a regular panel regression with fixed effects for month and neighborhood, it looks like this:
    Code:
    xtreg taxa_obito1 taxa_cf1 i.meseano2, fe
    ("taxa_obito1" is death rate by diabetes, "taxa_cf1" is the rate of primary medical facilities in the neighborhood at a specific month and year), the results are what I expected (and wanted) them to be, more primary medical facilities have a negative effect on death rate by diabetes (not saying anything about causality yet..), however, when I run
    Code:
    xtgls taxa_obito1 taxa_cf1
    the effect is positive! Meaning that it's saying that more medical facilities is positively correlated with deaths by diabetes, which seems weird to me. I should note that a lot of the values of the taxa_cf1 variable are zeroes, since not all of the neighborhoods have any medical facilities and those who do didn't have them until a certain time period, could that be affecting the model? When using this panel approach should I just delete the neighborhoods that didn't receive any facilities yet and focus on the ones that did?

    I should say that the diff in diff approach seems more thorough to me, although I don't know how to get around the "implementation in different periods for different neighborhoods" problem

    Sorry if the post was sort of confusing... any help would be appreciated

  • #2
    Anybody?

    Comment


    • #3
      Perhaps the problem was not clear, or not enough details were provided. I for one, am not exactly sure what you mean by the "implementation in different periods for different neighborhoods" problem, which wouldn't be a problem if you know when the implementation occurred in each neighborhood.

      I'll assume that's the case, that the data are in long format with one row per neighborhood month/year, and that taxa_cf1 is coded 1 if that neighborhood had a medical facility in that particular month and year, and 0 otherwise. I will also assume that you xtset the data with neighborhood as the grouping factor. As you can see, a few assumptions here.

      If that's the case, I think your fixed-effects model is preferable. The fixed effect for neighborhood controls for persistent unobserved characteristics of each neighborhood, and the time dummies control for an overall time trend. The GLS approach does neither. In the fixed-effects model identification comes from neighborhoods that changed status over time, so you are already focusing on those.

      Comment


      • #4
        Originally posted by German Rodriguez View Post
        Perhaps the problem was not clear, or not enough details were provided. I for one, am not exactly sure what you mean by the "implementation in different periods for different neighborhoods" problem, which wouldn't be a problem if you know when the implementation occurred in each neighborhood.

        I'll assume that's the case, that the data are in long format with one row per neighborhood month/year, and that taxa_cf1 is coded 1 if that neighborhood had a medical facility in that particular month and year, and 0 otherwise. I will also assume that you xtset the data with neighborhood as the grouping factor. As you can see, a few assumptions here.

        If that's the case, I think your fixed-effects model is preferable. The fixed effect for neighborhood controls for persistent unobserved characteristics of each neighborhood, and the time dummies control for an overall time trend. The GLS approach does neither. In the fixed-effects model identification comes from neighborhoods that changed status over time, so you are already focusing on those.
        First of all, thank you for answering me and for your patience, I'll try to explain myself better.

        First, what I meant by "implementation in different periods for different neighborhoods" is that the treatment starts in different periods (year and month) in the treated neighborhoods, meaning some received medical facilities in march 2010, some in may 2011 and so on, I'm not familiar on how to do a diff-in-diff approach in this case with several time periods.

        Indeed I did
        Code:
        xtset numbairro meseano2
        where numbairro is the number that identifies the neighborhood and meseano2 is the month and year (from january 2006 to december 2016), taxa_cf1 is the rate of medical facilities per capita in a certain neighborhood at a given time period, multiplied by 100.000, but I also have a variable like the one you described (coded 1 if neighborhood had a medical facility in a particular month and year and 0 otherwise)

        So basically I'm between a panel regression with fixed effects for time and neighborhood that tries to capture the effect of an increase in the medical facilities rate per neighborhood on the death rate, or the diff-in-diff model with multiple time periods specified above.

        Again, thank you for your patience, I'm new to Stata so still struggling with it...
        Last edited by Daniel Earp; 15 Dec 2017, 11:06.

        Comment


        • #5
          This may be a question of terminology, because there are situations where diff-in-diff and fixed effects models are exactly equivalent. Perhaps a simple example where they give the same answer will help? You may also consider providing your own sample data, check out dataex from SSC or built-in Stata 15.1. This will make Stata listers more likely to respond.

          Consider the webuse data on the weight of pigs. We'll keep just two weeks, randomly pick some pigs to get a special diet on the second week, and make them gain a bit more weight:

          Code:
          webuse pig, clear
          keep if week < 3
          set seed 1234
          gen diet = runiform() > .5 & week == 2
          replace weight = weight + rnormal(10,15) if diet
          First I'll fit a fixed effects model with group and time effects like yours. No xtset, just making everything explicit:

          Code:
          . quietly xtreg weight i.week diet, i(id) fe
          
          . di _b[diet]
          8.9272067
          Looks like our pigs gained on average 9 units more when on the diet. Now let us compute a difference (weight gain), keep just one observation per pig, and run a simple regression

          Code:
          . bysort id (week): gen diff = weight[2] - weight[1]
          
          . drop if week == 1
          (48 observations deleted)
          
          . quietly reg diff diet
          
          . di _b[diet]
          8.9272067
          We get exactly the same estimate. And this one is a diff-in-diff estimate because it is the difference in weight gain between those who went on the diet and those who didn't. It is also a fixed-effects estimator.

          The nice thing about the fixed-effects approach is that it doesn't have to be balanced, and it can accommodate situations where the treatment starts at different times for different units, which I think is your concern. I am not sure how best to handle the number of facilities, but would be inclined to use a time-varying indicator o whether they had received facilities, analogous to the diet. But others may have different views.

          Comment


          • #6
            Thanks for the answer! I have more questions if I could be so bold...

            So do you recommend that I run
            Code:
            xtreg taxa_obito1 treat i.meseano2, fe
            where treat = 1 if the neighborhood had a medical facility in a particular month and 0 otherwise, will that yield me results as if I were doing a diff-in-diff using multiple treatment periods?

            like I said, taxa_obito1 is the death rate by diabetes (per capita, multipled by 100000), at a specific neighborhood in a given month and year,numbairro is the number id of each neighborhood and meseano2 is the month and year (e.g. march 2007), since it varies from jan 2006 to dec 2016 there are 132 time observations, they look like this:
            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input int numbairro double taxa_obito1 float meseano2 byte treat
            1          0 16802 0
            1          0 16833 0
            1          0 16861 0
            1          0 16892 0
            1          0 16922 0
            1          0 16953 0
            1 .015821668 16983 0
            1          0 17014 0
            1          0 17045 0
            1          0 17075 0
            1          0 17106 0
            1 .015821668 17136 0
            1          0 17167 0
            1          0 17198 0
            1 .031643337 17226 0
            1 .015821668 17257 0
            1 .015821668 17287 0
            1          0 17318 0
            1          0 17348 0
            1 .015821668 17379 0
            1          0 17410 0
            1          0 17440 0
            1 .015821668 17471 0
            1          0 17501 0
            1          0 17532 0
            1          0 17563 0
            1 .015821668 17592 0
            1          0 17623 0
            1          0 17653 0
            1          0 17684 0
            1 .015821668 17714 0
            1          0 17745 0
            1          0 17776 0
            1          0 17806 0
            1 .015821668 17837 0
            1          0 17867 0
            1 .047465005 17898 0
            1          0 17929 0
            1          0 17957 0
            1          0 17988 0
            1          0 18018 0
            1          0 18049 0
            1          0 18079 0
            1 .031643337 18110 0
            1          0 18141 0
            1 .015821668 18171 0
            1          0 18202 0
            1          0 18232 0
            1          0 18263 0
            1          0 18294 0
            1 .015821668 18322 0
            1          0 18353 0
            1          0 18383 0
            1          0 18414 0
            1 .015821668 18444 0
            1 .015821668 18475 0
            1          0 18506 0
            1          0 18536 0
            1          0 18567 0
            1 .015821668 18597 0
            1          0 18628 0
            1 .015821668 18659 0
            1          0 18687 0
            1 .015821668 18718 0
            1          0 18748 0
            1          0 18779 0
            1          0 18809 0
            1 .015821668 18840 0
            1          0 18871 0
            1          0 18901 0
            1          0 18932 0
            1 .015821668 18962 0
            1          0 18993 0
            1          0 19024 0
            1          0 19053 0
            1          0 19084 0
            1          0 19114 0
            1          0 19145 0
            1          0 19175 0
            1          0 19206 0
            1          0 19237 0
            1          0 19267 0
            1          0 19298 0
            1          0 19328 0
            1          0 19359 0
            1          0 19390 0
            1 .015821668 19418 0
            1          0 19449 0
            1          0 19479 0
            1          0 19510 0
            1          0 19540 0
            1          0 19571 0
            1          0 19602 0
            1          0 19632 0
            1          0 19663 0
            1          0 19693 0
            1 .015821668 19724 0
            1          0 19755 0
            1          0 19783 0
            1          0 19814 0
            end
            format %tm meseano2

            this preview only shows the first neighborhood but it keeps going, this particular neighborhood (number 1) is very small and has very few deaths, hence the amount of zeroes, but it should be noted that this is not a rule, many neighborhoods have a lot more deaths at any given period, with more variability

            Again, thanks for taking your time to help me!

            Comment


            • #7
              I found this answer on statalist by Jeff Wooldridge on a problem I think was similar to mine and attempted to do what he said, I ran
              Code:
              xtreg taxa_obito1 treat i.meseano2, fe
              , a sample of the regression:
              Code:
                xtreg taxa_obito1 treat i.meseano2, fe
              
              Fixed-effects (within) regression               Number of obs     =     20,856
              Group variable: numbairro                       Number of groups  =        158
              
              R-sq:                                           Obs per group:
                   within  = 0.0203                                         min =        132
                   between = 0.1188                                         avg =      132.0
                   overall = 0.0031                                         max =        132
              
                                                              F(132,20566)      =       3.22
              corr(u_i, Xb)  = -0.0323                        Prob > F          =     0.0000
              
              ------------------------------------------------------------------------------
               taxa_obito1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
              -------------+----------------------------------------------------------------
                     treat |  -.0010505    .000554    -1.90   0.058    -.0021364    .0000355
                           |
                  meseano2 |
                    16833  |  -.0011015   .0021561    -0.51   0.609    -.0053276    .0031245
                    16861  |  -.0024033   .0021561    -1.11   0.265    -.0066293    .0018228
                    16892  |  -.0018025   .0021561    -0.84   0.403    -.0060285    .0024236
                    16922  |   .0021029   .0021561     0.98   0.329    -.0021232    .0063289
                    16953  |   .0008011   .0021561     0.37   0.710     -.003425    .0050271
              I ran testparm i.meseano2 and it shows Prob > F = 0.0000 so I'm guessing I was correct to include i.meseano2 in the xtreg

              However, when I run the same regression with cluster this is what happens:
              Code:
              xtreg taxa_obito1 treat i.meseano2, fe cluster(numbairro)
              
              Fixed-effects (within) regression               Number of obs     =     20,856
              Group variable: numbairro                       Number of groups  =        158
              
              R-sq:                                           Obs per group:
                   within  = 0.0203                                         min =        132
                   between = 0.1188                                         avg =      132.0
                   overall = 0.0031                                         max =        132
              
                                                              F(132,157)        =      14.01
              corr(u_i, Xb)  = -0.0323                        Prob > F          =     0.0000
              
                                          (Std. Err. adjusted for 158 clusters in numbairro)
              ------------------------------------------------------------------------------
                           |               Robust
               taxa_obito1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
              -------------+----------------------------------------------------------------
                     treat |  -.0010505   .0011209    -0.94   0.350    -.0032645    .0011635
                           |
                  meseano2 |
                    16833  |  -.0011015   .0025178    -0.44   0.662    -.0060747    .0038717
                    16861  |  -.0024033   .0021528    -1.12   0.266    -.0066555    .0018489
                    16892  |  -.0018025   .0018852    -0.96   0.340    -.0055261    .0019211
                    16922  |   .0021029   .0023681     0.89   0.376    -.0025746    .0067804
                    16953  |   .0008011   .0021368     0.37   0.708    -.0034195    .0050217
                    16983  |   .0036049   .0021562     1.67   0.097     -.000654    .0078638
              The coefficient remains the same but the p-value changed from a barely acceptable one to a high one, what is the main difference between these 2 models? Am I on the right path here?

              Comment


              • #8
                Seems to be the same advice I gave you yesterday, a fixed-effects model that is equivalent to diff-in-diff with multiple time periods. The choice of predictor depends on whether you view the treatment as adding a medical facility or increasing the ratio of facilities to population. Using clustered standard errors produces a robust variance estimator, I am not surprised it is different. Something else you might consider if you have the actual counts of deaths is to use a fixed-effects Poisson model, with the death counts as outcome and the population as exposure, a standard approach to count data.

                Comment


                • #9
                  Thanks for the answer, this is the model I came up with, I would very much appreciate opinions on it, I defined the dummy variables trat1, trat2, trat3...trat9 in a way that trat1 = 1 if the given neighborhood received it's first medical facility in a certain time period and 0 otherwise and so on, e.g., trati = 1 if the given neighborhood received it's ith medical facility in a certain time period and 0 otherwise. In some cases, more than 1 facility were installed at the same time period, which means that, for instance, a certain neighborhood could have trat3=1 and trat4=1 in the same time period and so on. From what I can tell I'm basically distinguishing the treaments saying that having another facility is a different type of treatment, my goal was to try to capture the effect of extra facilities instead of the "has facility - doesn't have facility" logic. Is this reasonable?

                  I changed to yearly dates to simplify my reasoning and because I don't expect new medical facilities to have an immediate impact on the next months.

                  Code:
                  xtreg taxa_obito1 trat1 trat2 trat3 trat4 trat5 trat6 trat7 trat8 trat9 i.ano, fe
                  
                  Fixed-effects (within) regression               Number of obs     =      1,738
                  Group variable: numbairro                       Number of groups  =        158
                  
                  R-sq:                                           Obs per group:
                       within  = 0.0910                                         min =         11
                       between = 0.4553                                         avg =       11.0
                       overall = 0.0138                                         max =         11
                  
                                                                  F(19,1561)        =       8.23
                  corr(u_i, Xb)  = -0.1929                        Prob > F          =     0.0000
                  
                  ------------------------------------------------------------------------------
                   taxa_obito1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                  -------------+----------------------------------------------------------------
                         trat1 |  -.0101113   .0106687    -0.95   0.343    -.0310378    .0108152
                         trat2 |  -.0006348   .0188673    -0.03   0.973    -.0376428    .0363731
                         trat3 |  -.0297648   .0405268    -0.73   0.463    -.1092575    .0497279
                         trat4 |  -.0830819   .0497743    -1.67   0.095    -.1807135    .0145498
                         trat5 |  -.1731677   .0585419    -2.96   0.003    -.2879967   -.0583386
                         trat6 |  -.0021817   .0616618    -0.04   0.972    -.1231304     .118767
                         trat7 |   -.007754   .0840263    -0.09   0.926    -.1725704    .1570623
                         trat8 |  -.1134312   .0840175    -1.35   0.177    -.2782303     .051368
                         trat9 |   -.394849   .0840684    -4.70   0.000    -.5597479   -.2299502
                               |
                           ano |
                         2007  |   .0142195   .0088531     1.61   0.108    -.0031457    .0315846
                         2008  |    .022631   .0088531     2.56   0.011     .0052658    .0399962
                         2009  |   .0309341   .0088554     3.49   0.000     .0135644    .0483038
                         2010  |   .0561215   .0089108     6.30   0.000      .038643       .0736
                         2011  |   .0294726   .0089858     3.28   0.001     .0118471    .0470981
                         2012  |   .0085764   .0089047     0.96   0.336    -.0088901    .0260428
                         2013  |   .0009944   .0088596     0.11   0.911    -.0163836    .0183724
                         2014  |  -.0092809   .0088545    -1.05   0.295    -.0266488    .0080871
                         2015  |  -.0068273   .0088772    -0.77   0.442    -.0242399    .0105852
                         2016  |   .0052042    .009007     0.58   0.563    -.0124629    .0228714
                               |
                         _cons |   .2406295   .0062601    38.44   0.000     .2283505    .2529086
                  -------------+----------------------------------------------------------------
                       sigma_u |  .33445532
                       sigma_e |  .07868779
                           rho |  .94755056   (fraction of variance due to u_i)
                  ------------------------------------------------------------------------------
                  F test that all u_i=0: F(157, 1561) = 174.91                 Prob > F = 0.0000
                  Does this model make sense?

                  Comment


                  • #10
                    I would recommend using a fixed-effects Poisson model with monthly death counts as outcome and population as exposure, as noted earlier. Your predictor could be the number of facilities in existence each month treated as a factor variable. This gives you diff-in-diff estimates. Your latest proposal appears to aggregate the data by combining months with different conditions, which may dilute effects. It is not clear to me how your treatment variables are coded and how they change over time. So I don't see the advantages of this approach.

                    Comment

                    Working...
                    X