Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference in Difference model with multiple treatment periods

    Dear Statalist community,

    I want to execute a Diff-in-diff estimation.

    My data is on the municipal-level and comprises the years 1936-2011.
    Municipal-year-level observations;
    Municipality:= 1,..,1442;
    year:= 1936,..,2011.


    My treatment:
    around 450 Municipalities were struck by a seismic event of major force:
    I have identified 11 Earthquakes in this period that struck different municipalities at different time.
    I expect the municipalities affected by such an event to display lower population density growth in the long-run.


    I coded various dummy variables to display the treatment period and groups:
    One showing value 1 if th Municipality is part of the treatment group, 0 if part of the control group
    Another one displaying value one in each treated Municipality solely for the year in which the earthquake struck
    And a third one displaying value 1 for each affected Municipality FROM the year on in which the earthquake struck, lets call this one = EQ

    I then mutiplied EQ * Year Variable to create the interaction


    So I tried to code the regression this way:


    Code:
    xi: areg popdensity post_treatment i.Year, a(code) cluster (code)

    Questions:

    Which of the treatment variables should I use?
    Is the way I created the interaction term correct?
    How do I correctly code my regression?
    And how do I add lags and leads into my equation?


    Thank you very much for your support/help in advance!

    Kind regards,
    Loris

  • #2
    Loris,

    Welcome to the forum. For a (basic) DiD analysis, you need two dummy variables. The first identifies the treatment group (1 if the unit belongs to the treatment group, 0 otherwise). I call this variable treatment below. The second dummy identifies the post-treatment period (1 after the event, 0 before). This variable I call post below. Then, you interact both variables with each other. Since, multiplying something with 0 yields always 0, the interaction will only be 1 for treatment units after the event.

    An easy way to to this is:

    Code:
    reg popdensity i.treatment i.post i.treatment#i.post
    sometimes you will also see the following:

    Code:
    reg popdensity i.treatment##i.post
    This is completely identical.

    It's possible to include additional covariates in the DiD estimation (e.g. years). After declaring that your dataset is a panel (see -xtset-), you may also use -xtreg, fe- instead of -reg- to include (individual) fixed effects. I'm guessing that this is what you wanted with using the -absorb- option of the -areg- command. In this case, no coefficient for i.treatment can be estimated (this is no problem, though).

    Assuming that you have used -xtset- before, you can include lags by using the the prefix l1. and l2. (and so on) for lags. For example, l1.popdensity to include the population density of the previous time period (in your case, the previous year). However, I'm not saying that this specification makes sense. For leads you simply use f1., f2. and so on, instead.

    Last edited by Sebastian Geiger; 24 Mar 2017, 14:10.

    Comment


    • #3
      Using xtreg, which is available after you xtset your data, you might specify your regression like:
      xtreg popdensity i.EQ i.year, fe cluster(code)

      This is a DiD specification when the treatment variable (here called "EQ") is coded as you described.
      And a third one displaying value 1 for each affected Municipality FROM the year on in which the earthquake struck, lets call this one = EQ
      You do not need an interaction term in this case, because the interaction is implicit in the treatment variable. The treatment variable equals 1 if an earthquake has struck in the municipality, and only after or at the same time the earthquake has struck. That is the equivalent of treatment#post. Note that this specification assumes the dependent variable is affected by the treatment variable at a constant rate from the time the treatment was implemented (i.e. a single change in intercept).

      You could use the same specification with the treatment variable equal to 1 the year the Earthquake hits, or some other variable. Note that you assume the earthquake only has an effect when the treatment variable equal 1, so it might make more sense to make the treatment variable equal to 1 the year after an earthquake (or something like that). I say that because I'm from the pacific northwest and we get major earthquakes every 30 years or so. It could make sense to try to model that kind of effect. The important thing is that the treatment variable is turned on for observations when the causal mechanism is affecting the dependent variable, whatever that means.
      Last edited by Kris Bitney; 24 Mar 2017, 17:05.

      Comment


      • #4
        Kris,
        you probably forgot to include the dependent variable in the -xtreg- command. ;-)


        Loris,

        What are the values of your -year- variable? Is it 1 after the earthquake and 0 before or does it reflect the actual year of the observations? In the latter case, you need to add the -post- variable I described above. In the first case, the -year- variable is identical to the -post- variable.

        I usually prefer using Stata's factor syntax to include interactions in a regression rather than generating a new variable, because then Stata knows that the interaction term and its separate components "belong together". This may be important for some post-estimation commands.

        Comment


        • #5
          Originally posted by Sebastian Geiger View Post
          Kris,
          you probably forgot to include the dependent variable in the -xtreg- command. ;-)
          Yes, thanks.

          Comment


          • #6
            Kris, Sebastian,

            I can't thank you enough for your great help.

            I coded every variable just as you said.
            Now I'm left with one issue: my interaction between the treatment and post varables
            (i.treatment#i.post) results to be omitted because of collinearity... would you know why this would happen? how do I solve this problem?

            Comment


            • #7
              Trying later with Kris' equation

              xtreg popdensity i.EQ i.year, fe cluster(code)

              Where EQ=post variable explained by Sebastian and year is a variable reflecting the actual year of the observations, I receive a negative significant coefficient for EQ(post). Can I really just interpret this coefficient? (It would make sense as I was just interacting two dummies that would have yielded such a variable anyways )

              And yes Kris I'll try check for a retarded effect but I'm analyzing the Italian response in demographic mobility to earthquakes and we tend to be screwed right away

              Comment


              • #8
                Now I'm left with one issue: my interaction between the treatment and post varables
                (i.treatment#i.post) results to be omitted because of collinearity... would you know why this would happen? how do I solve this problem?
                No, I would not expect this to happen. In fact, the coefficient of this interaction is the coefficient of interest. It shows (at least it should) the effect of earthquakes on the population density. If the model drops this interaction, something is wrong and the entire DiD model does not make any sense.

                It's hard to tell what the problem is without seeing your actual data and what exactly you typed. If your data are not classified or something, it would be helpful if you could use the user-written command -dataex- (available by typing -ssc install dataex-) to post an excerpt from the dataset here. In addition, you should post the command followed by the output you received. I guess that allows us to help you in a more efficient way

                Comment


                • #9
                  Here is the excerpt

                  Code:
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input float(popdensity treatment_ post) int Year str34 Municipality
                   143 0 0 1972 "ABBADIA SAN SALVATORE"
                    31 0 0 1972 "ABBATEGGIO"           
                    26 0 0 1972 "ABETONE"              
                    29 1 1 1972 "ACCIANO"              
                    14 1 1 1972 "ACCUMOLI"             
                     6 1 1 1972 "ACQUACANINA"          
                    17 1 1 1972 "ACQUAFONDATA"         
                    67 0 0 1972 "ACQUALAGNA"           
                    46 1 1 1972 "ACQUAPENDENTE"        
                    35 1 1 1972 "ACQUASANTA TERME"     
                    59 0 0 1972 "ACQUASPARTA"          
                    40 0 0 1972 "ACQUAVIVA COLLECROCE" 
                    46 1 0 1972 "ACQUAVIVA D'ISERNIA"  
                   128 0 0 1972 "ACQUAVIVA PICENA"     
                   132 0 0 1972 "ACUTO"                
                   108 0 0 1972 "AFFILE"               
                  1090 0 0 1972 "AGLIANA"              
                    69 1 0 1972 "AGNONE"               
                   134 0 0 1972 "AGOSTA"               
                   123 1 1 1972 "AGUGLIANO"            
                    44 1 0 1972 "AIELLI"               
                   116 0 0 1972 "ALANNO"               
                   207 0 0 1972 "ALATRI"               
                   725 0 0 1972 "ALBA ADRIATICA"       
                  1041 0 0 1972 "ALBANO LAZIALE"       
                    23 1 1 1972 "ALFEDENA"             
                    20 0 0 1972 "ALLERONA"             
                    43 1 1 1972 "ALLUMIERE"            
                   116 0 0 1972 "ALTIDONA"             
                   157 0 0 1972 "ALTINO"               
                   305 0 0 1972 "ALTOPASCIO"           
                    62 0 0 1972 "ALVIANO"              
                    62 1 1 1972 "ALVITO"               
                    62 1 1 1972 "AMANDOLA"             
                    49 0 0 1972 "AMASENO"              
                    20 1 1 1972 "AMATRICE"             
                    81 0 0 1972 "AMELIA"               
                   143 0 0 1972 "ANAGNI"               
                   103 1 1 1972 "ANCARANO"             
                    46 0 0 1972 "ANGHIARI"             
                    62 0 0 1972 "ANGUILLARA SABAZIA"   
                    60 0 0 1972 "ANTICOLI CORRADO"     
                    50 1 1 1972 "ANTRODOCO"            
                    21 0 0 1972 "ANVERSA DEGLI ABRUZZI"
                   537 0 0 1972 "ANZIO"                
                    27 0 0 1972 "APECCHIO"             
                    57 0 0 1972 "APIRO"                
                   141 1 1 1972 "APPIGNANO"            
                    90 1 1 1972 "APPIGNANO DEL TRONTO" 
                   164 0 0 1972 "APRILIA"              
                   194 1 1 1972 "AQUINO"               
                   153 1 0 1972 "ARCE"                 
                    54 0 0 1972 "ARCEVIA"              
                    88 0 0 1972 "ARCHI"                
                    53 0 0 1972 "ARCIDOSSO"            
                    54 0 0 1972 "ARCINAZZO ROMANO"     
                   129 0 0 1972 "ARDEA"                
                   228 0 0 1972 "AREZZO"               
                   149 0 0 1972 "ARI"                  
                   608 0 0 1972 "ARICCIA"              
                   109 0 0 1972 "ARIELLI"              
                    38 1 0 1972 "ARLENA DI CASTRO"     
                   192 0 0 1972 "ARNARA"               
                   136 1 0 1972 "ARPINO"               
                    26 1 1 1972 "ARQUATA DEL TRONTO"   
                    64 1 1 1972 "ARRONE"               
                    45 1 1 1972 "ARSITA"               
                   131 0 0 1972 "ARSOLI"               
                   163 0 0 1972 "ARTENA"               
                    27 0 0 1972 "ASCIANO"              
                   348 1 1 1972 "ASCOLI PICENO"        
                    28 1 0 1972 "ASCREA"               
                   128 1 0 1972 "ASSISI"               
                    42 1 0 1972 "ATELETA"              
                    85 0 0 1972 "ATESSA"               
                   153 1 0 1972 "ATINA"                
                   125 1 1 1972 "ATRI"                 
                   157 0 0 1972 "ATTIGLIANO"           
                    65 0 0 1972 "AUDITORE"             
                   172 0 0 1972 "AULLA"                
                   135 1 1 1972 "AUSONIA"              
                   309 1 0 1972 "AVEZZANO"             
                    43 0 0 1972 "AVIGLIANO UMBRO"      
                    15 0 0 1972 "BADIA TEDALDA"        
                    49 0 0 1972 "BAGNI DI LUCCA"       
                   305 0 0 1972 "BAGNO A RIPOLI"       
                    49 0 0 1972 "BAGNOLI DEL TRIGNO"   
                    42 0 0 1972 "BAGNONE"              
                    55 1 1 1972 "BAGNOREGIO"           
                    69 1 0 1972 "BALSORANO"            
                   123 0 0 1972 "BARANELLO"            
                   119 0 0 1972 "BARBARA"              
                    24 1 1 1972 "BARBARANO ROMANO"     
                    56 0 0 1972 "BARBERINO DI MUGELLO" 
                    53 0 0 1972 "BARBERINO VAL D'ELSA" 
                    68 0 0 1972 "BARCHI"               
                    27 1 0 1972 "BARETE"               
                   164 0 0 1972 "BARGA"                
                    21 1 0 1972 "BARISCIANO"           
                    11 1 0 1972 "BARREA"               
                  end
                  format %ty Year

                  And here the command I used

                  Code:
                  reg popdensity i.treatment_ i.post i.treatment_#i.post
                  and here the result

                  Code:
                  note: 0b.treatment_#1.post identifies no observations in the sample
                  note: 1.treatment_#1.post omitted because of collinearity
                  
                        Source |       SS           df       MS      Number of obs   =   108,594
                  -------------+----------------------------------   F(2, 108591)    =   1126.37
                         Model |  90193775.3         2  45096887.6   Prob > F        =    0.0000
                      Residual |  4.3477e+09   108,591  40037.1946   R-squared       =    0.0203
                  -------------+----------------------------------   Adj R-squared   =    0.0203
                         Total |  4.4379e+09   108,593  40867.0243   Root MSE        =    200.09
                  
                  ---------------------------------------------------------------------------------
                       popdensity |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                  ----------------+----------------------------------------------------------------
                     1.treatment_ |   -57.3079   1.472293   -38.92   0.000    -60.19357   -54.42222
                           1.post |  -10.90045   2.256165    -4.83   0.000     -15.3225   -6.478399
                                  |
                  treatment_#post |
                             0 1  |          0  (empty)
                             1 1  |          0  (omitted)
                                  |
                            _cons |   156.9247   .7443288   210.83   0.000     155.4658    158.3835
                  ---------------------------------------------------------------------------------
                  I'm afraid the intercept between post and treatment_, when observations are treated from different periods on, is equal to the post variable. That could be the reason for ommiting it...

                  Again thanks for the help

                  Comment


                  • #10
                    Your data are not suitable for a DID analysis. You have no observations with both treatment_ = 0 and post = 1:
                    Code:
                    . tab treatment_ post
                    
                               |         post
                    treatment_ |         0          1 |     Total
                    -----------+----------------------+----------
                             0 |        60          0 |        60
                             1 |        15         25 |        40
                    -----------+----------------------+----------
                         Total |        75         25 |       100
                    That is the reason the output shows (empty) for the 0 1 combination. So with only three possible treatment_# post_ combinations in the data, treatment_#post becomes completely colinear because it is always just equal to post.

                    In order to do a DID analysis you must have both pre- and post- observations in both the treatment and control groups. In the classic DID analysis this is simple: there is a single start time (date, year, whatever) at which treatment begins for everyone in the treatment group. That same start time then defines the post variable for all observations: -gen post = time > start_time-.

                    Your situation is more complicated. The earthquakes that occurred in the treatment cities would have, I imagine, occurred in various different years. So there is no obvious year to use to distinguish pre- and post- for the control group. The best solution here would be to form matched treatment-control pairs. You should try to match them on variables you have that are predictive of your popdensity outcome. The best way to do this depends on the details of what variables are available to you and how they related to each other and to popdensity--so you may need to get advice from a colleague in your field who is knowledgable about these matters.

                    Anyway, once you have created your matched treatment-control pairs, then you impute to the control municipality the same start date for the post-earthquake era as the actual earthquake year of its matched treatment municipality. The quick way to do that in code is:

                    Code:
                    by pair (treatment_), sort: replace post = post[_N]
                    where pair is a variable that identifies the matched pairs. This assures that both members of the pair have the same values of post in any year.

                    Once you do this, you will have treatment = 0 post = 1 observations, and your interaction term will no longer disappear. You will be able to do your analysis.

                    Comment


                    • #11
                      Oh Jesus... Well that requires "some" more work. But at least now everything is clear. Can't thank you all enough for your help.

                      Comment


                      • #12
                        Originally posted by Clyde Schechter View Post
                        Your data are not suitable for a DID analysis. You have no observations with both treatment_ = 0 and post = 1:
                        Code:
                        . tab treatment_ post
                        
                        | post
                        treatment_ | 0 1 | Total
                        -----------+----------------------+----------
                        0 | 60 0 | 60
                        1 | 15 25 | 40
                        -----------+----------------------+----------
                        Total | 75 25 | 100
                        That is the reason the output shows (empty) for the 0 1 combination. So with only three possible treatment_# post_ combinations in the data, treatment_#post becomes completely colinear because it is always just equal to post.

                        In order to do a DID analysis you must have both pre- and post- observations in both the treatment and control groups. In the classic DID analysis this is simple: there is a single start time (date, year, whatever) at which treatment begins for everyone in the treatment group. That same start time then defines the post variable for all observations: -gen post = time > start_time-.

                        Your situation is more complicated. The earthquakes that occurred in the treatment cities would have, I imagine, occurred in various different years. So there is no obvious year to use to distinguish pre- and post- for the control group. The best solution here would be to form matched treatment-control pairs. You should try to match them on variables you have that are predictive of your popdensity outcome. The best way to do this depends on the details of what variables are available to you and how they related to each other and to popdensity--so you may need to get advice from a colleague in your field who is knowledgable about these matters.

                        Anyway, once you have created your matched treatment-control pairs, then you impute to the control municipality the same start date for the post-earthquake era as the actual earthquake year of its matched treatment municipality. The quick way to do that in code is:

                        Code:
                        by pair (treatment_), sort: replace post = post[_N]
                        where pair is a variable that identifies the matched pairs. This assures that both members of the pair have the same values of post in any year.

                        Once you do this, you will have treatment = 0 post = 1 observations, and your interaction term will no longer disappear. You will be able to do your analysis.


                        Very thoughtful commentary as always, Clyde. I have a question that I would like to follow-up on here. It looks that you recommended the matched-pair approach here, but given their data couldn't the OP of just used the generalized diff-in-diff approach as suggested by Kris? i.e., xtreg popdensity i.EQ i.year, fe cluster(code)

                        Thanks

                        Comment

                        Working...
                        X