xtdidregress and treatment timing problem

Sara Cucaro

Join Date: Feb 2022
Posts: 9

xtdidregress and treatment timing problem

26 Jun 2022, 03:48

Goodmorning everyone,

It's my first time posting on statalist, so please forgive me for any possible mistakes. I'm trying to carry out a DiD analysis with homogeneous treatment using the command xtdidregress on Stata17. All the states in the considered treatment group receive the treatment in 1996, therefore I created the dummy variable "treatment" as the interaction of a dummy for participating countries and a dummy >= 1996. Therefore, it takes value = 1 for participating countries >= 1996. I want to observe the impact of having received the treatment on the y variable revenues.
This is an example of the data I'm using:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str34 country float id int year float treatment double revenues
"Angola"   3 1980 0                  .
"Angola"   3 1981 0                  .
"Angola"   3 1982 0                  .
"Angola"   3 1983 0                  .
"Angola"   3 1984 0                  .
"Angola"   3 1985 0   38.5590223893859
"Angola"   3 1986 0 37.440774273166085
"Angola"   3 1987 0  28.15715496592235
"Angola"   3 1988 0 26.318105068155557
"Angola"   3 1989 0 26.678764600008403
"Angola"   3 1990 0 25.603858087712762
"Angola"   3 1991 0 18.286622438623453
"Angola"   3 1992 0  35.59674673728012
"Angola"   3 1993 0  38.97868309765103
"Angola"   3 1994 0               26.1
"Angola"   3 1995 0 24.211764705882352
"Angola"   3 1996 0 36.816046966731896
"Angola"   3 1997 0 29.767382174521696
"Angola"   3 1998 0  28.79078694817658
"Angola"   3 1999 0  41.42462044091039
"Angola"   3 2000 0 42.812036507072904
"Angola"   3 2001 0 38.823995894445176
"Angola"   3 2002 0  28.66809284136252
"Angola"   3 2003 0  29.71193899979683
"Angola"   3 2004 0  30.98746169132483
"Angola"   3 2005 0 33.695143292354956
"Angola"   3 2006 0                  .
"Angola"   3 2007 0                  .
"Angola"   3 2008 0                  .
"Angola"   3 2009 0  36.82759164763131
"Angola"   3 2010 0  41.79623846838015
"Angola"   3 2011 0  44.62457646648777
"Angola"   3 2012 0  40.46642317555491
"Angola"   3 2013 0  35.83175445244411
"Angola"   3 2014 0 30.131542754536834
"Angola"   3 2015 0 23.053284196959346
"Angola"   3 2016 0  16.56236385597934
"Angola"   3 2017 0 16.666419902972514
"Angola"   3 2018 0 21.159166826250452
"Angola"   3 2019 0 19.024422476166702
"Angola"   3 2020 0  17.32964490984264
"Burundi" 13 1980 0 17.222313141467748
"Burundi" 13 1981 0 17.256408528228977
"Burundi" 13 1982 0 18.760568772799303
"Burundi" 13 1983 0 16.115698460401287
"Burundi" 13 1984 0 18.119885348730943
"Burundi" 13 1985 0   16.3402857426051
"Burundi" 13 1986 0 19.070355686291226
"Burundi" 13 1987 0  17.08156779661017
"Burundi" 13 1988 0  17.34970353015092
"Burundi" 13 1989 0   24.4175677182031
"Burundi" 13 1990 0 23.510032943995206
"Burundi" 13 1991 0  25.00837860929247
"Burundi" 13 1992 0   25.5501488207453
"Burundi" 13 1993 0 26.565614873231517
"Burundi" 13 1994 0  20.50917560821052
"Burundi" 13 1995 0  21.30667595829979
"Burundi" 13 1996 1 18.410532041553026
"Burundi" 13 1997 1  16.58190820395747
"Burundi" 13 1998 1  18.00535659358659
"Burundi" 13 1999 1 17.442813806673772
"Burundi" 13 2000 1 18.206108452289474
"Burundi" 13 2001 1  16.69040871582759
"Burundi" 13 2002 1 18.694387127430367
"Burundi" 13 2003 1 21.721675288621608
"Burundi" 13 2004 1 24.278108411778952
"Burundi" 13 2005 1 19.176590158941405
"Burundi" 13 2006 1  26.55164516375296
"Burundi" 13 2007 1                  .
"Burundi" 13 2008 1                  .
"Burundi" 13 2009 1 32.817532645019156
"Burundi" 13 2010 1  37.20981652883773
"Burundi" 13 2011 1  33.64481506518453
"Burundi" 13 2012 1 33.777901901771045
"Burundi" 13 2013 1  32.93114754098361
"Burundi" 13 2014 1 30.516129032258064
"Burundi" 13 2015 1                  .
"Burundi" 13 2016 1                  .
"Burundi" 13 2017 1                  .
"Burundi" 13 2018 1                  .
"Burundi" 13 2019 1                  .
"Burundi" 13 2020 1                  .
"Benin"   15 1980 0  8.777399210478057
"Benin"   15 1981 0 15.576651109378387
"Benin"   15 1982 0  16.31812237434467
"Benin"   15 1983 0 12.732701088865472
"Benin"   15 1984 0  11.06599830551901
"Benin"   15 1985 0 11.847020943421677
"Benin"   15 1986 0 10.955432223593863
"Benin"   15 1987 0 11.407167050753774
"Benin"   15 1988 0 11.803766330529346
"Benin"   15 1989 0  9.559569630368483
"Benin"   15 1990 0  8.486426598875077
"Benin"   15 1991 0  9.880866576833485
"Benin"   15 1992 0  11.46952743208144
"Benin"   15 1993 0 11.569943730551936
"Benin"   15 1994 0  12.39768792228859
"Benin"   15 1995 0 12.849657614075152
"Benin"   15 1996 1 13.104247932252141
"Benin"   15 1997 1 13.612375047184594
end

However, when I use xtdidregress (after xtset id year, yearly), I obtain the following results:

Code:

xtdidregress (revenues) (treatment), group(id) time(year)

Number of groups and treatment time

Time variable: year
Control:       treatment = 0
Treatment:     treatment = 1
-----------------------------------
             |   Control  Treatment
-------------+---------------------
Group        |
          id |        24         24
-------------+---------------------
Time         |
     Minimum |      1980       1996
     Maximum |      2013       2000
-----------------------------------

Difference-in-differences regression                     Number of obs = 1,566
Data type: Longitudinal

                                    (Std. err. adjusted for 48 clusters in id)
------------------------------------------------------------------------------
             |               Robust
    revenues | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
ATET         |
   treatment |
   (1 vs 0)  |   6.413122   1.517473     4.23   0.000      3.36036    9.465884
------------------------------------------------------------------------------
Note: ATET estimate adjusted for panel effects and time effects.

As you can observe from the table, treatment assignment seems to vary. In fact, when I try to observe parallel trends, I get this message:

Code:

estat trendplot
treatment assignment times vary; not allowed with estat trendplots

However, my treatment timing shouldn't vary, since all the countries in the treatment group receive the treatment in 1996. I already checked by whole dataset, and the treatment dummy is always 1 for years >= 1996, only for treated countries, and 0 otherwise. What am I getting wrong? I tried searching for previous questions but didn't find an answer.
Thank you all incredibly!

Tags: None

Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#2

26 Jun 2022, 04:21

Hey what's up Sara. Thanks so much for showing your data and code on your first post. Show me the results of

Code:

su year if treatment==1

Additionally, your results say 2000 is the maximum value for treatment. Shouldn't it go to 2013?

Last edited by Jared Greathouse; 26 Jun 2022, 04:25.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10180
#3

26 Jun 2022, 04:23

Are revenues missing for some treated countries over the years 1996-2000? Show the result of

Code:

bys country (year): g tag= treatment & missing(revenues) & inrange(year, 1996, 2000) list if tag, sepby(country)
Comment
Sara Cucaro

Join Date: Feb 2022

Posts: 9
#4

26 Jun 2022, 04:36

Thank you both for helping!

Hi Jared, this summarizes my year variable:

Code:

su year if treatment==1 Variable | Obs Mean Std. dev. Min Max -------------+--------------------------------------------------------- year | 600 2008 7.217119 1996 2020

The maximum year for treatment is 2020, but not because they receive it every year. The treatment is an international initiative (HIPC debt relief) that the treatment group received in 1996, the dummy = 1 for all years after the initial one (including 1996).
Comment

Sara Cucaro

Join Date: Feb 2022
Posts: 9

26 Jun 2022, 04:37

Hi Andrew! Yes, there are some missing values for some treated countries in the years considered:

Code:

 bys country (year): g tag= treatment & missing(revenues) & inrange(year, 1996, 2000)
 list if tag, sepby(country)

      +-----------------------------------------------------+
      |    country    id   year   treatm~t   revenues   tag |
      |-----------------------------------------------------|
1042. | Madagascar   112   1996          1          .     1 |
1043. | Madagascar   112   1997          1          .     1 |
1044. | Madagascar   112   1998          1          .     1 |
1045. | Madagascar   112   1999          1          .     1 |
      +-----------------------------------------------------+

Comment

Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#6

26 Jun 2022, 04:44

Perhaps.... drop Madagascar and see what happens? It's likely the missing data that's causing the issue. That is,

Code:

xtdidregress (revenues) (treatment) if country ~="Madagascar", group(id) time

Last edited by Jared Greathouse; 26 Jun 2022, 04:47.
Comment

Sara Cucaro

Join Date: Feb 2022
Posts: 9

26 Jun 2022, 06:32

Dropping Madagascar only wasn't very useful, but observing missing data and dropping half of the countries I considered led me to these results:

Code:

xtdidregress (revenues) (treatment), group(id) time(year)

Number of groups and treatment time

Time variable: year
Control:       treatment = 0
Treatment:     treatment = 1
-----------------------------------
             |   Control  Treatment
-------------+---------------------
Group        |
          id |        10         14
-------------+---------------------
Time         |
     Minimum |      1980       1996
     Maximum |      1980       1996
-----------------------------------

Difference-in-differences regression                       Number of obs = 937
Data type: Longitudinal

                                    (Std. err. adjusted for 24 clusters in id)
------------------------------------------------------------------------------
             |               Robust
    revenues | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
ATET         |
   treatment |
   (1 vs 0)  |    5.34417   1.905843     2.80   0.010     1.401634    9.286706
------------------------------------------------------------------------------
Note: ATET estimate adjusted for panel effects and time effects.

Now the treatment year appears to be correct for all the countries, but I had to substantially reduce my dataset (some countries did have missing data for 10+ consecutive years, something I didn't notice, but some others had only 1/2 years of missing data). I don't understand much why this happens. Isn't there a way to make Stata "tolerate" these missing values and carry out the analysis anyways?

Comment

Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#8

26 Jun 2022, 07:22

So now I have a few more questions: firstly, what is the objective here? Do you wanna do causal inference? Presumably yes.

Also, how many units in your sample are ever treated? Third, how many units have missing outcome data?
Comment
Sara Cucaro

Join Date: Feb 2022

Posts: 9
#9

26 Jun 2022, 07:48

Yes, the objective is to establish a causal relationship. 24 units over 48 countries are treated. There are 402 missing outcome values over 1,968, which I realize is a high number, distributed among 41 of the countries I considered. Unfortunately, 11 countries do have a large sequence of missing data, something that might inevitably bias my analysis. I wasn't able to find a more complete data source.

I was thinking I could also simply perform the DiD analysis manually, the results I get are similar for now. However, in this way I couldn't use estat trendplot to verify the parallel trends assumption.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10180
#10

26 Jun 2022, 09:59

If not too many values are missing, you can consider interpolation or extrapolation and make this explicit in your description. See

Code:

help ipolate
1 like
Comment

Khalid Atinoaga Compaore

Join Date: Apr 2022
Posts: 16

#11

26 Jun 2022, 11:17

Good evening @GMT,

Code:

I am working on a dataset that has a survey covering 1971-2011 with over 1,700+ obs. using stata 16.0. it is balanced when set it at panel. I "preserved and keep" selected variables for my regression. But when I used "xtreg" the vars that are described as "str" it throws a results

as these below...... [CODE] with a 'describe' the results is

Code:

 xtreg countrycode countryname country_id region
no observations
r(2000);

. xtreg gdpdef ind_cb reer lcusdr
insufficient observations
r(2001);

. xtreg gdpdef ind_cb reer lcusdr sav_gdp gnipc gdp_lcu
insufficient observations
r(2001);

. xtreg invest source_of_info Liabilities
no observations
r(2000);

Code:

describe

Contains data from lic_domestic_public_debt_dataset.dta
  obs:         1,722                          
 vars:            28                          5 Mar 2013 09:23
-------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
-------------------------------------------------------------------------------------
year            int     %8.0g                 
countrycode     str3    %9s                   CountryCode
countryname     str37   %37s                  Country name
country_id      float   %9.0g                 group(countryname)
region          str26   %26s                  Region
currency        str42   %42s                  Currency Unit
gdpdef          float   %9.0g                 GDP Deflator_ WDI_NY.GDP.DEFL.KD.ZG
ind_cb          float   %9.0g                 Indipendence of the Central Bank
reer            float   %9.0g                 Real effective exchange rate
lcusdr          float   %9.0g                 Exchange rate LCU/SDR
sav_gdp         float   %9.0g                 Saving over Gdp
gnipc           float   %9.0g                 Gni per capita
gdp_lcu         float   %9.0g                 Gdp local currency
exchangerate    float   %9.0g                 Official exchange rate (LCU per US$,
                                                period average)_WDI
defcgcons       float   %9.0g                 Deficit_ Central Government
                                                consolidated_LCU
defit_ifs       float   %9.0g                 Deficit_LCU
deficit_weo     float   %9.0g                 Deficit_General Government

However,

Code:

 xtdes

    year:  1971, 1972, ..., 2011                             n =         41
country_id:  1, 2, ..., 43                                   T =         42
           Delta(country_id) = 1 unit
           Span(country_id)  = 43 periods
           (year*country_id uniquely identifies each observation)

Distribution of T_i:   min      5%     25%       50%       75%     95%     max
                        42      42      42        42        42      42      42

     Freq.  Percent    Cum. |  Pattern
 ---------------------------+---------------------------------------------
       41    100.00  100.00 |  1111111111111111111.11111111111111111111111


 ---------------------------+---------------------------------------------
       41    100.00         |  XXXXXXXXXXXXXXXXXXX.XXXXXXXXXXXXXXXXXXXXXXX

I again tried to destring and all strings the results is as per this depictions.....

.

Code:

 destring,replace
year already numeric; no replace
countrycode: contains nonnumeric characters; no replace
countryname: contains nonnumeric characters; no replace
country_id already numeric; no replace
region: contains nonnumeric characters; no replace
currency: contains nonnumeric characters; no replace

I need some help:
1. To be help reduce the data to specific variables of interest...
2. To be able run the model to describe the features of the data set...
3. To run a panel IV estimation
4. fixed and random effects....and
5.Dynamic panel model
before that, help me clean the dataset with the necessary modle xtcodes to run these modles......

Khalid Atinoaga.....

Comment

Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#12

26 Jun 2022, 11:31

Khalid Atinoaga Compaore Hey Khalid, please start a new thread, and format your query according to the FAQ.

Sara Cucaro I agree that ipolate might be a good way to interpolate missing values, though of course other people like Carlo Lazzaro may have different opinions than me or Andrew. Really it depends on how much missing data there are in a given unit.
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17700
#13

26 Jun 2022, 12:06

Sara:
as an aside to Andrew and Jared's excellent advice, most of the strategy depends on the mechanism undelying the missingness of your data (MCAR, MAR or MNAR).
That said, I would:
1) discuss the issue with your supervisor, just to avoid problems when the deadline is approaching;
2) see what others in your research field usually do when dealing with missing data.

Kind regards,
Carlo
(Stata 19.0)
Comment
Sara Cucaro

Join Date: Feb 2022

Posts: 9
#14

27 Jun 2022, 02:06

Thank you all for the very helpful advice, I will definitely do some research and discuss these issues further!
Comment
Khalid Atinoaga Compaore

Join Date: Apr 2022

Posts: 16
#15

30 Jun 2022, 22:54

@jared Greadthouse: Thanks for the response. And sorry for the late acknowledgement
. I was offline.
Comment

Announcement

xtdidregress and treatment timing problem

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment