Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • xtdidregress and treatment timing problem

    Goodmorning everyone,

    It's my first time posting on statalist, so please forgive me for any possible mistakes. I'm trying to carry out a DiD analysis with homogeneous treatment using the command xtdidregress on Stata17. All the states in the considered treatment group receive the treatment in 1996, therefore I created the dummy variable "treatment" as the interaction of a dummy for participating countries and a dummy >= 1996. Therefore, it takes value = 1 for participating countries >= 1996. I want to observe the impact of having received the treatment on the y variable revenues.
    This is an example of the data I'm using:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str34 country float id int year float treatment double revenues
    "Angola"   3 1980 0                  .
    "Angola"   3 1981 0                  .
    "Angola"   3 1982 0                  .
    "Angola"   3 1983 0                  .
    "Angola"   3 1984 0                  .
    "Angola"   3 1985 0   38.5590223893859
    "Angola"   3 1986 0 37.440774273166085
    "Angola"   3 1987 0  28.15715496592235
    "Angola"   3 1988 0 26.318105068155557
    "Angola"   3 1989 0 26.678764600008403
    "Angola"   3 1990 0 25.603858087712762
    "Angola"   3 1991 0 18.286622438623453
    "Angola"   3 1992 0  35.59674673728012
    "Angola"   3 1993 0  38.97868309765103
    "Angola"   3 1994 0               26.1
    "Angola"   3 1995 0 24.211764705882352
    "Angola"   3 1996 0 36.816046966731896
    "Angola"   3 1997 0 29.767382174521696
    "Angola"   3 1998 0  28.79078694817658
    "Angola"   3 1999 0  41.42462044091039
    "Angola"   3 2000 0 42.812036507072904
    "Angola"   3 2001 0 38.823995894445176
    "Angola"   3 2002 0  28.66809284136252
    "Angola"   3 2003 0  29.71193899979683
    "Angola"   3 2004 0  30.98746169132483
    "Angola"   3 2005 0 33.695143292354956
    "Angola"   3 2006 0                  .
    "Angola"   3 2007 0                  .
    "Angola"   3 2008 0                  .
    "Angola"   3 2009 0  36.82759164763131
    "Angola"   3 2010 0  41.79623846838015
    "Angola"   3 2011 0  44.62457646648777
    "Angola"   3 2012 0  40.46642317555491
    "Angola"   3 2013 0  35.83175445244411
    "Angola"   3 2014 0 30.131542754536834
    "Angola"   3 2015 0 23.053284196959346
    "Angola"   3 2016 0  16.56236385597934
    "Angola"   3 2017 0 16.666419902972514
    "Angola"   3 2018 0 21.159166826250452
    "Angola"   3 2019 0 19.024422476166702
    "Angola"   3 2020 0  17.32964490984264
    "Burundi" 13 1980 0 17.222313141467748
    "Burundi" 13 1981 0 17.256408528228977
    "Burundi" 13 1982 0 18.760568772799303
    "Burundi" 13 1983 0 16.115698460401287
    "Burundi" 13 1984 0 18.119885348730943
    "Burundi" 13 1985 0   16.3402857426051
    "Burundi" 13 1986 0 19.070355686291226
    "Burundi" 13 1987 0  17.08156779661017
    "Burundi" 13 1988 0  17.34970353015092
    "Burundi" 13 1989 0   24.4175677182031
    "Burundi" 13 1990 0 23.510032943995206
    "Burundi" 13 1991 0  25.00837860929247
    "Burundi" 13 1992 0   25.5501488207453
    "Burundi" 13 1993 0 26.565614873231517
    "Burundi" 13 1994 0  20.50917560821052
    "Burundi" 13 1995 0  21.30667595829979
    "Burundi" 13 1996 1 18.410532041553026
    "Burundi" 13 1997 1  16.58190820395747
    "Burundi" 13 1998 1  18.00535659358659
    "Burundi" 13 1999 1 17.442813806673772
    "Burundi" 13 2000 1 18.206108452289474
    "Burundi" 13 2001 1  16.69040871582759
    "Burundi" 13 2002 1 18.694387127430367
    "Burundi" 13 2003 1 21.721675288621608
    "Burundi" 13 2004 1 24.278108411778952
    "Burundi" 13 2005 1 19.176590158941405
    "Burundi" 13 2006 1  26.55164516375296
    "Burundi" 13 2007 1                  .
    "Burundi" 13 2008 1                  .
    "Burundi" 13 2009 1 32.817532645019156
    "Burundi" 13 2010 1  37.20981652883773
    "Burundi" 13 2011 1  33.64481506518453
    "Burundi" 13 2012 1 33.777901901771045
    "Burundi" 13 2013 1  32.93114754098361
    "Burundi" 13 2014 1 30.516129032258064
    "Burundi" 13 2015 1                  .
    "Burundi" 13 2016 1                  .
    "Burundi" 13 2017 1                  .
    "Burundi" 13 2018 1                  .
    "Burundi" 13 2019 1                  .
    "Burundi" 13 2020 1                  .
    "Benin"   15 1980 0  8.777399210478057
    "Benin"   15 1981 0 15.576651109378387
    "Benin"   15 1982 0  16.31812237434467
    "Benin"   15 1983 0 12.732701088865472
    "Benin"   15 1984 0  11.06599830551901
    "Benin"   15 1985 0 11.847020943421677
    "Benin"   15 1986 0 10.955432223593863
    "Benin"   15 1987 0 11.407167050753774
    "Benin"   15 1988 0 11.803766330529346
    "Benin"   15 1989 0  9.559569630368483
    "Benin"   15 1990 0  8.486426598875077
    "Benin"   15 1991 0  9.880866576833485
    "Benin"   15 1992 0  11.46952743208144
    "Benin"   15 1993 0 11.569943730551936
    "Benin"   15 1994 0  12.39768792228859
    "Benin"   15 1995 0 12.849657614075152
    "Benin"   15 1996 1 13.104247932252141
    "Benin"   15 1997 1 13.612375047184594
    end
    However, when I use xtdidregress (after xtset id year, yearly), I obtain the following results:

    Code:
    xtdidregress (revenues) (treatment), group(id) time(year)
    
    Number of groups and treatment time
    
    Time variable: year
    Control:       treatment = 0
    Treatment:     treatment = 1
    -----------------------------------
                 |   Control  Treatment
    -------------+---------------------
    Group        |
              id |        24         24
    -------------+---------------------
    Time         |
         Minimum |      1980       1996
         Maximum |      2013       2000
    -----------------------------------
    
    Difference-in-differences regression                     Number of obs = 1,566
    Data type: Longitudinal
    
                                        (Std. err. adjusted for 48 clusters in id)
    ------------------------------------------------------------------------------
                 |               Robust
        revenues | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
    ATET         |
       treatment |
       (1 vs 0)  |   6.413122   1.517473     4.23   0.000      3.36036    9.465884
    ------------------------------------------------------------------------------
    Note: ATET estimate adjusted for panel effects and time effects.
    As you can observe from the table, treatment assignment seems to vary. In fact, when I try to observe parallel trends, I get this message:

    Code:
    estat trendplot
    treatment assignment times vary; not allowed with estat trendplots
    However, my treatment timing shouldn't vary, since all the countries in the treatment group receive the treatment in 1996. I already checked by whole dataset, and the treatment dummy is always 1 for years >= 1996, only for treated countries, and 0 otherwise. What am I getting wrong? I tried searching for previous questions but didn't find an answer.
    Thank you all incredibly!

  • #2
    Hey what's up Sara. Thanks so much for showing your data and code on your first post. Show me the results of
    Code:
    su year if treatment==1
    Additionally, your results say 2000 is the maximum value for treatment. Shouldn't it go to 2013?
    Last edited by Jared Greathouse; 26 Jun 2022, 04:25.

    Comment


    • #3
      Are revenues missing for some treated countries over the years 1996-2000? Show the result of

      Code:
      bys country (year): g tag= treatment & missing(revenues) & inrange(year, 1996, 2000)
      list if tag, sepby(country)

      Comment


      • #4
        Thank you both for helping!

        Hi Jared, this summarizes my year variable:

        Code:
        su year if treatment==1
        
            Variable |        Obs        Mean    Std. dev.       Min        Max
        -------------+---------------------------------------------------------
                year |        600        2008    7.217119       1996       2020
        The maximum year for treatment is 2020, but not because they receive it every year. The treatment is an international initiative (HIPC debt relief) that the treatment group received in 1996, the dummy = 1 for all years after the initial one (including 1996).

        Comment


        • #5
          Hi Andrew! Yes, there are some missing values for some treated countries in the years considered:

          Code:
           bys country (year): g tag= treatment & missing(revenues) & inrange(year, 1996, 2000)
           list if tag, sepby(country)
          
                +-----------------------------------------------------+
                |    country    id   year   treatm~t   revenues   tag |
                |-----------------------------------------------------|
          1042. | Madagascar   112   1996          1          .     1 |
          1043. | Madagascar   112   1997          1          .     1 |
          1044. | Madagascar   112   1998          1          .     1 |
          1045. | Madagascar   112   1999          1          .     1 |
                +-----------------------------------------------------+

          Comment


          • #6
            Perhaps.... drop Madagascar and see what happens? It's likely the missing data that's causing the issue. That is,

            Code:
            xtdidregress (revenues) (treatment) if country ~="Madagascar", group(id) time
            Last edited by Jared Greathouse; 26 Jun 2022, 04:47.

            Comment


            • #7
              Dropping Madagascar only wasn't very useful, but observing missing data and dropping half of the countries I considered led me to these results:

              Code:
              xtdidregress (revenues) (treatment), group(id) time(year)
              
              Number of groups and treatment time
              
              Time variable: year
              Control:       treatment = 0
              Treatment:     treatment = 1
              -----------------------------------
                           |   Control  Treatment
              -------------+---------------------
              Group        |
                        id |        10         14
              -------------+---------------------
              Time         |
                   Minimum |      1980       1996
                   Maximum |      1980       1996
              -----------------------------------
              
              Difference-in-differences regression                       Number of obs = 937
              Data type: Longitudinal
              
                                                  (Std. err. adjusted for 24 clusters in id)
              ------------------------------------------------------------------------------
                           |               Robust
                  revenues | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
              -------------+----------------------------------------------------------------
              ATET         |
                 treatment |
                 (1 vs 0)  |    5.34417   1.905843     2.80   0.010     1.401634    9.286706
              ------------------------------------------------------------------------------
              Note: ATET estimate adjusted for panel effects and time effects.
              Now the treatment year appears to be correct for all the countries, but I had to substantially reduce my dataset (some countries did have missing data for 10+ consecutive years, something I didn't notice, but some others had only 1/2 years of missing data). I don't understand much why this happens. Isn't there a way to make Stata "tolerate" these missing values and carry out the analysis anyways?

              Comment


              • #8
                So now I have a few more questions: firstly, what is the objective here? Do you wanna do causal inference? Presumably yes.

                Also, how many units in your sample are ever treated? Third, how many units have missing outcome data?

                Comment


                • #9
                  Yes, the objective is to establish a causal relationship. 24 units over 48 countries are treated. There are 402 missing outcome values over 1,968, which I realize is a high number, distributed among 41 of the countries I considered. Unfortunately, 11 countries do have a large sequence of missing data, something that might inevitably bias my analysis. I wasn't able to find a more complete data source.

                  I was thinking I could also simply perform the DiD analysis manually, the results I get are similar for now. However, in this way I couldn't use estat trendplot to verify the parallel trends assumption.

                  Comment


                  • #10
                    If not too many values are missing, you can consider interpolation or extrapolation and make this explicit in your description. See

                    Code:
                    help ipolate

                    Comment


                    • #11
                      Good evening @GMT,

                      Code:
                      I am working on a dataset that has a survey covering 1971-2011 with over 1,700+ obs. using stata 16.0. it is balanced when set it at panel. I "preserved and keep" selected variables for my regression. But when I used "xtreg" the vars that are described as "str" it throws a results
                      as these below...... [CODE] with a 'describe' the results is


                      Code:
                       xtreg countrycode countryname country_id region
                      no observations
                      r(2000);
                      
                      . xtreg gdpdef ind_cb reer lcusdr
                      insufficient observations
                      r(2001);
                      
                      . xtreg gdpdef ind_cb reer lcusdr sav_gdp gnipc gdp_lcu
                      insufficient observations
                      r(2001);
                      
                      . xtreg invest source_of_info Liabilities
                      no observations
                      r(2000);
                      Code:
                      describe
                      
                      Contains data from lic_domestic_public_debt_dataset.dta
                        obs:         1,722                          
                       vars:            28                          5 Mar 2013 09:23
                      -------------------------------------------------------------------------------------
                                    storage   display    value
                      variable name   type    format     label      variable label
                      -------------------------------------------------------------------------------------
                      year            int     %8.0g                 
                      countrycode     str3    %9s                   CountryCode
                      countryname     str37   %37s                  Country name
                      country_id      float   %9.0g                 group(countryname)
                      region          str26   %26s                  Region
                      currency        str42   %42s                  Currency Unit
                      gdpdef          float   %9.0g                 GDP Deflator_ WDI_NY.GDP.DEFL.KD.ZG
                      ind_cb          float   %9.0g                 Indipendence of the Central Bank
                      reer            float   %9.0g                 Real effective exchange rate
                      lcusdr          float   %9.0g                 Exchange rate LCU/SDR
                      sav_gdp         float   %9.0g                 Saving over Gdp
                      gnipc           float   %9.0g                 Gni per capita
                      gdp_lcu         float   %9.0g                 Gdp local currency
                      exchangerate    float   %9.0g                 Official exchange rate (LCU per US$,
                                                                      period average)_WDI
                      defcgcons       float   %9.0g                 Deficit_ Central Government
                                                                      consolidated_LCU
                      defit_ifs       float   %9.0g                 Deficit_LCU
                      deficit_weo     float   %9.0g                 Deficit_General Government
                      However,
                      Code:
                       xtdes
                      
                          year:  1971, 1972, ..., 2011                             n =         41
                      country_id:  1, 2, ..., 43                                   T =         42
                                 Delta(country_id) = 1 unit
                                 Span(country_id)  = 43 periods
                                 (year*country_id uniquely identifies each observation)
                      
                      Distribution of T_i:   min      5%     25%       50%       75%     95%     max
                                              42      42      42        42        42      42      42
                      
                           Freq.  Percent    Cum. |  Pattern
                       ---------------------------+---------------------------------------------
                             41    100.00  100.00 |  1111111111111111111.11111111111111111111111
                      
                      
                       ---------------------------+---------------------------------------------
                             41    100.00         |  XXXXXXXXXXXXXXXXXXX.XXXXXXXXXXXXXXXXXXXXXXX

                      I again tried to destring and all strings the results is as per this depictions.....

                      .
                      Code:
                       destring,replace
                      year already numeric; no replace
                      countrycode: contains nonnumeric characters; no replace
                      countryname: contains nonnumeric characters; no replace
                      country_id already numeric; no replace
                      region: contains nonnumeric characters; no replace
                      currency: contains nonnumeric characters; no replace
                      I need some help:
                      1. To be help reduce the data to specific variables of interest...
                      2. To be able run the model to describe the features of the data set...
                      3. To run a panel IV estimation
                      4. fixed and random effects....and
                      5.Dynamic panel model
                      before that, help me clean the dataset with the necessary modle xtcodes to run these modles......

                      Khalid Atinoaga.....

                      Comment


                      • #12
                        Khalid Atinoaga Compaore Hey Khalid, please start a new thread, and format your query according to the FAQ.

                        Sara Cucaro I agree that ipolate might be a good way to interpolate missing values, though of course other people like Carlo Lazzaro may have different opinions than me or Andrew. Really it depends on how much missing data there are in a given unit.

                        Comment


                        • #13
                          Sara:
                          as an aside to Andrew and Jared's excellent advice, most of the strategy depends on the mechanism undelying the missingness of your data (MCAR, MAR or MNAR).
                          That said, I would:
                          1) discuss the issue with your supervisor, just to avoid problems when the deadline is approaching;
                          2) see what others in your research field usually do when dealing with missing data.
                          Kind regards,
                          Carlo
                          (Stata 19.0)

                          Comment


                          • #14
                            Thank you all for the very helpful advice, I will definitely do some research and discuss these issues further!

                            Comment


                            • #15
                              @jared Greadthouse: Thanks for the response. And sorry for the late acknowledgement
                              . I was offline.

                              Comment

                              Working...
                              X