Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • xtdidregress fails to include full treatment time period

    Good morning everyone,

    This is my first post on Statalist so I hope I am doing this right. I am running into some problems with xtdidregress. I saw some other posts on my specific question as well, but I tried to incorporate the feedback on those posts but still run into problems unfortunately. I am trying to carry out a DiD analysis with homogeneous treatment. My dependent variable is patent data running from 1996 - 2021 for a sample of 13 control countries and 20 treated countries. The treatment occurs from 2005 - 2021.This interaction dummy is called 'did' in my dataset and turns 1 for countries part of the treatment group and when the year is >= 2005.

    Here is a snippet of the data I am using:

    . dataex
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte Country_ID str20 Country_name int year byte treatment double patents float(time did phase_3)
    1 "Austria" 1996 1 161.666666666664 0 0 0
    1 "Austria" 1997 1 190.999999999998 0 0 0
    1 "Austria" 1998 1 161.616666666665 0 0 0
    1 "Austria" 1999 1 190.849999999998 0 0 0
    1 "Austria" 2000 1 214.149999999997 0 0 0
    1 "Austria" 2001 1 207.249999999996 0 0 0
    1 "Austria" 2002 1 184.199999999995 0 0 0
    1 "Austria" 2003 1 205.983333333329 0 0 0
    1 "Austria" 2004 1 221.883333333327 0 0 0
    1 "Austria" 2005 1            232.5 1 1 0
    1 "Austria" 2006 1           269.33 1 1 0
    1 "Austria" 2007 1           279.03 1 1 0
    1 "Austria" 2008 1           309.18 1 1 0
    1 "Austria" 2009 1            317.1 1 1 0
    1 "Austria" 2010 1           397.06 1 1 0
    1 "Austria" 2011 1           405.02 1 1 0
    1 "Austria" 2012 1            410.1 1 1 0
    1 "Austria" 2013 1           331.73 1 1 1
    1 "Austria" 2014 1           366.46 1 1 1
    1 "Austria" 2015 1           363.02 1 1 1
    1 "Austria" 2016 1           338.28 1 1 1
    1 "Austria" 2017 1           369.49 1 1 1
    1 "Austria" 2018 1           380.68 1 1 1
    1 "Austria" 2019 1           433.24 1 1 1
    1 "Austria" 2020 1           404.01 1 1 1
    1 "Austria" 2021 1           263.73 1 1 0
    2 "Belgium" 1996 1   55.83333333333 0 0 0
    2 "Belgium" 1997 1  73.666666666661 0 0 0
    2 "Belgium" 1998 1  87.166666666663 0 0 0
    2 "Belgium" 1999 1  89.183333333327 0 0 0
    2 "Belgium" 2000 1  109.74285714285 0 0 0
    2 "Belgium" 2001 1 111.316666666655 0 0 0
    2 "Belgium" 2002 1  108.41666666666 0 0 0
    2 "Belgium" 2003 1 114.033333333327 0 0 0
    2 "Belgium" 2004 1 128.366666666658 0 0 0
    2 "Belgium" 2005 1           146.77 1 1 0
    2 "Belgium" 2006 1           143.41 1 1 0
    2 "Belgium" 2007 1           175.93 1 1 0
    2 "Belgium" 2008 1           169.35 1 1 0
    2 "Belgium" 2009 1           212.03 1 1 0
    2 "Belgium" 2010 1           227.18 1 1 0
    2 "Belgium" 2011 1           192.43 1 1 0
    2 "Belgium" 2012 1           201.03 1 1 0
    2 "Belgium" 2013 1           217.05 1 1 1
    2 "Belgium" 2014 1           232.95 1 1 1
    2 "Belgium" 2015 1            229.2 1 1 1
    2 "Belgium" 2016 1           247.02 1 1 1
    2 "Belgium" 2017 1           236.92 1 1 1
    2 "Belgium" 2018 1           268.62 1 1 1
    2 "Belgium" 2019 1           219.36 1 1 1
    2 "Belgium" 2020 1            229.6 1 1 1
    2 "Belgium" 2021 1           229.58 1 1 0
    3 "Czechia" 1996 1  11.666666666666 0 0 0
    3 "Czechia" 1997 1  10.833333333332 0 0 0
    3 "Czechia" 1998 1  20.833333333333 0 0 0
    3 "Czechia" 1999 1  21.166666666666 0 0 0
    3 "Czechia" 2000 1  12.666666666665 0 0 0
    3 "Czechia" 2001 1  16.833333333332 0 0 0
    3 "Czechia" 2002 1  17.226190476186 0 0 0
    3 "Czechia" 2003 1  24.666666666666 0 0 0
    3 "Czechia" 2004 1  30.449999999999 0 0 0
    3 "Czechia" 2005 1             24.9 1 1 0
    3 "Czechia" 2006 1            32.05 1 1 0
    3 "Czechia" 2007 1             56.2 1 1 0
    3 "Czechia" 2008 1            46.12 1 1 0
    3 "Czechia" 2009 1            44.33 1 1 0
    3 "Czechia" 2010 1            44.81 1 1 0
    3 "Czechia" 2011 1            51.57 1 1 0
    3 "Czechia" 2012 1            45.67 1 1 0
    3 "Czechia" 2013 1            55.75 1 1 1
    3 "Czechia" 2014 1            56.67 1 1 1
    3 "Czechia" 2015 1            43.75 1 1 1
    3 "Czechia" 2016 1            65.92 1 1 1
    3 "Czechia" 2017 1            67.39 1 1 1
    3 "Czechia" 2018 1            69.06 1 1 1
    3 "Czechia" 2019 1            57.89 1 1 1
    3 "Czechia" 2020 1            79.96 1 1 1
    3 "Czechia" 2021 1            52.33 1 1 0
    4 "Denmark" 1996 1  59.333333333331 0 0 0
    4 "Denmark" 1997 1  78.666666666665 0 0 0
    4 "Denmark" 1998 1 102.833333333333 0 0 0
    4 "Denmark" 1999 1  90.866666666666 0 0 0
    4 "Denmark" 2000 1 105.333333333333 0 0 0
    4 "Denmark" 2001 1 115.583333333329 0 0 0
    4 "Denmark" 2002 1 116.033333333332 0 0 0
    4 "Denmark" 2003 1 127.266666666664 0 0 0
    4 "Denmark" 2004 1 126.199999999998 0 0 0
    4 "Denmark" 2005 1           181.37 1 1 0
    4 "Denmark" 2006 1           193.75 1 1 0
    4 "Denmark" 2007 1            314.1 1 1 0
    4 "Denmark" 2008 1           403.98 1 1 0
    4 "Denmark" 2009 1           384.52 1 1 0
    4 "Denmark" 2010 1           479.57 1 1 0
    4 "Denmark" 2011 1            511.2 1 1 0
    4 "Denmark" 2012 1           425.93 1 1 0
    4 "Denmark" 2013 1            350.8 1 1 1
    4 "Denmark" 2014 1           379.98 1 1 1
    4 "Denmark" 2015 1           344.75 1 1 1
    4 "Denmark" 2016 1           428.42 1 1 1
    4 "Denmark" 2017 1           388.75 1 1 1
    end
    This snippet above only includes countries from my treated sample, but the problems should already be highlighted with this sample. After I run xtdidregress:

    . xtdidregress (patents) (did), group(Country_ID) time(year)
    Code:
    Number of groups and treatment time
    
    Time variable: year
    Control:       did = 0
    Treatment:     did = 1
    -----------------------------------
                 |   Control  Treatment
    -------------+---------------------
    Group        |
      Country_ID |        13         20
    -------------+---------------------
    Time         |
         Minimum |      1996       2005
         Maximum |      1996       2005
    -----------------------------------
    
    Difference-in-differences regression                       Number of obs = 858
    Data type: Longitudinal
    
                                (Std. err. adjusted for 33 clusters in Country_ID)
    ------------------------------------------------------------------------------
                 |               Robust
         patents | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
    ATET         |
             did |
       (1 vs 0)  |  -725.1259   487.5723    -1.49   0.147    -1718.278    268.0265
    ------------------------------------------------------------------------------
    Note: ATET estimate adjusted for panel effects and time effects.
    .
    It becomes clear that it does not take into account any observation after 2005. I do not understand why. I have already excluded all countries that had some missing values, even those that only had 1 missing value over the span of 26 years. However, this does not help the problem much. I might have done something very basic wrong, but I do not understand why no data after 2005 is taken into account. STATA does allow me to run estat trendplots, but I do not think it is really appropriate since it's clear something is going wrong in the previous steps. Any help on this topic would be much appreciated!

  • #2
    Dear Marjolijn,

    If you use the option -aeq- you will see all the covariates used in estimation, which includes all time periods. In other words type:

    xtdidregress (patents) (did), group(Country_ID) time(year) aeq

    The group and treatment-time table is summarizing the first time that the different treated and control groups appear in the data. This helps you observe treatment-timing variation.

    Comment


    • #3
      Hello Enrique,

      Thanks a lot for your response! If I run the code you have given me, I see that all time units are included as control:

      . xtdidregress (patents) (did), group(Country_ID) time(year) aeq

      Code:
      Number of groups and treatment time
      
      Time variable: year
      Control:       did = 0
      Treatment:     did = 1
      -----------------------------------
                   |   Control  Treatment
      -------------+---------------------
      Group        |
        Country_ID |        17         25
      -------------+---------------------
      Time         |
           Minimum |      1996       2005
           Maximum |      2000       2006
      -----------------------------------
      
      Difference-in-differences regression                     Number of obs = 1,071
      Data type: Longitudinal
      
                                  (Std. err. adjusted for 42 clusters in Country_ID)
      ------------------------------------------------------------------------------
                   |               Robust
           patents | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
      -------------+----------------------------------------------------------------
      ATET         |
               did |
         (1 vs 0)  |  -560.3104   394.0647    -1.42   0.163     -1356.14    235.5195
      -------------+----------------------------------------------------------------
      Controls     |
              year |
             1997  |    38.9574   31.10985     1.25   0.218    -23.87023     101.785
             1998  |   21.26645   30.50862     0.70   0.490    -40.34696    82.87987
             1999  |   102.9377   50.85819     2.02   0.050     .2275511    205.6479
             2000  |   181.9918   80.86527     2.25   0.030     18.68109    345.3025
             2001  |   204.2827   90.25791     2.26   0.029     22.00314    386.5622
             2002  |   208.3914   95.07016     2.19   0.034     16.39332    400.3895
             2003  |    245.152   105.6606     2.32   0.025     31.76609     458.538
             2004  |   293.9861   123.5064     2.38   0.022     44.55995    543.4123
             2005  |   658.8721   348.3404     1.89   0.066    -44.61561     1362.36
             2006  |   713.7304    364.421     1.96   0.057    -22.23283    1449.694
             2007  |   765.5252   382.7264     2.00   0.052    -7.406528    1538.457
             2008  |   784.5547   384.1551     2.04   0.048     8.737662    1560.372
             2009  |    854.784   406.5229     2.10   0.042      33.7944    1675.774
             2010  |   959.8752   448.4864     2.14   0.038     54.13851    1865.612
             2011  |   1010.234   467.3273     2.16   0.037     66.44778    1954.021
             2012  |   1025.471    481.031     2.13   0.039     54.00957    1996.933
             2013  |   1021.207   490.7985     2.08   0.044     30.01934    2012.395
             2014  |   1001.812   475.4729     2.11   0.041      41.5755    1962.049
             2015  |   977.5234   467.7857     2.09   0.043     32.81097    1922.236
             2016  |   993.4268   476.2612     2.09   0.043     31.59777    1955.256
             2017  |   1002.618   475.2634     2.11   0.041     42.80405    1962.432
             2018  |   1013.682   474.7315     2.14   0.039     54.94193    1972.421
             2019  |   954.3962    444.499     2.15   0.038     56.71219     1852.08
             2020  |   956.8411   444.6878     2.15   0.037     58.77583    1854.906
             2021  |   822.9334   423.5714     1.94   0.059    -32.48635    1678.353
                   |
             _cons |   228.0627   165.5065     1.38   0.176    -106.1845    562.3099
      ------------------------------------------------------------------------------
      Note: ATET estimate adjusted for panel effects and time effects.
      However, I still do not understand why the treatment-time table does not include my entire time range. I have now ran xtdidregress on my whole dataset, also including some countries with missing data on the outcome variable. This is why the number of observations is higher than in my previous post and the treatment-time table is somewhat different. However, there are only a few missing values in my dataset spread out over the whole time period and there are for sure enough countries with proper observations from 2006 onwards, but this is not confirmed by the treatment-time table. Do you have any idea why the treatment-time table only goes up to 2006 by any chance? I would have expected that the Maximum would have been 2021 as the treatment runs from 2005 - 2021 (as could also be seen in the snippet of the data I posed above).

      Thanks again so much for helping me out, looking forward to your reply!

      Comment


      • #4
        Hi Marjolijn,

        The treatment-time table is for the FIRST time a country is observed either as a treatment or a control. So for the treated countries, for instance, we first observed some countries to be treated in 2005 and some in 2006. There is some heterogeneity in treatment timing. There is some further discussion of the output and what it means in the first example of the -didregress- documentation.

        https://www.stata.com/manuals/causaldidregress.pdf

        Comment


        • #5
          Hi Enrique,

          I understand now, I misunderstood the output at first. Thank you so much for the elaboration!

          Comment

          Working...
          X