Difference-in-Difference with Panel Data

Will Page

Join Date: Apr 2017
Posts: 1

Difference-in-Difference with Panel Data

20 Apr 2017, 08:21

Hello, everyone!

I am fairly new to Stata and I am trying to work out how to complete a DID analysis using Panel Data. My data set contains 12 countries in a Panel Data format between 1980 and 2015. For each country, I have a list of observed variables over the time period.

During the time series, a policy change is implemented within 3 of the 12 countries (2004). I would like to use these 3 countries as a treatment group and the remaining 9 as the control group.

I have included three treatment variables that take the following values:

Code:

 Treatment Variable Indicators

Treat
1 if unit of observation is Treated Unit

0 if unit of observation is Control Unit

Post
1 if period is post-treatment

0 if period is pre-treatment

TreatPost (Treat * Post)
1 if unit is treated and in post-treatment period

0 otherwise

In order to determine the significance of the policy change, I would like to use the DID approach. I constructed a panel data regression with the following command:

Code:

xtreg W_Trade_M Treat Post TreatPost Population Unemployment Avg_Month_Wage, re

(Population, Unemployment, Avg_Month_Wage are observed variables within the data set.)

Code:

-------------------------------------------------------------------------------
    W_Trade_M |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
        Treat |   2.17e+10   3.05e+10     0.71   0.476    -3.80e+10    8.15e+10
         Post |   1.13e+10   5.41e+09     2.08   0.037     6.65e+08    2.19e+10
    TreatPost |   9.14e+10   7.40e+09    12.35   0.000     7.69e+10    1.06e+11
   Population |   1863.556   605.2821     3.08   0.002     677.2246    3049.887
 Unemployment |  -3.82e+09   8.54e+08    -4.48   0.000    -5.49e+09   -2.15e+09
Avg_Month_W~e |   2.21e+07    2943090     7.52   0.000     1.64e+07    2.79e+07
        _cons |   6.64e+09   2.17e+10     0.31   0.759    -3.59e+10    4.92e+10
--------------+----------------------------------------------------------------
      sigma_u |  4.186e+10
      sigma_e |  2.315e+10
          rho |  .76576531   (fraction of variance due to u_i)
-------------------------------------------------------------------------------

STATA produces the results and the coefficients on my explanatory variables seem legitimate. However when interpreting the coefficient on the TreatPost variable to conclude the DID estimate it seems unrealistically large. Is this a question of me interpreting the coefficient incorrectly or is there something wrong with my setup.

Any help would be greatly appreciated.

Many thanks,

Last edited by Will Page; 20 Apr 2017, 09:20.

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30069
#2

20 Apr 2017, 17:21

Your interpretation looks correct: according to this model, the control groupo experienced an increase in W_Trade_M of about 1.13e10 units, whereas in the treatment group it went up by about 10.27e10, a difference of 9.14e10. And although you don't show us the code the implemented your model, the description you give of the variables is correct.

I think that it is not a good idea to trust your intuitions on what is "too large" when everything is in such astronomical numbers. It can also be a problem estimating regressions when you have variables whose scales differ by so many orders of magnitude (0-1 vs numbers of order 10¹⁰.) I imagine that your outcome variable is denominated in some relatively small units, perhaps currency units like dollars, or euros, or maybe yen or yuan. If I were you I would change the units on that variable to millions or even billions of currency units, so that the numbers in the regression data will all be of similar magnitudes. At the least, the results will be easier to wrap your mind around, and perhaps some numerical errors in the estimation will be avoided.

One mroe thing, if you are using a modern version of Stata, you should use factor-variable notation (-help fvvarlist-) for this model. Scrap your TreatPost variable and code it this way:

Code:

xtreg W_Trade_M i.Treat##i.Post Population Unemployment Avg_Month_Wage, re

The main advantage is that after that you will be able to calculate predicted means and marginal effects using the -margins- command.

I'm also curious why you're using a random effects model. There are only 12 countries, so you are not getting a very thorough sampling of the country-effect space in your data. Why not fixed-effects here? (If you go to a fixed effects model, then you will use the Treat variable due to colinearity with the fixed effects, but that doesn't matter because it's just a nuisance parameter in that model anyway. You still want to interpret the treatment effect as coming from the Treat#Post interaction term.)
Comment
Linh Nguyen

Join Date: Nov 2017

Posts: 85
#3

22 Jun 2018, 07:09

Clyde Schechter,

Could I ask you a question? Treat, Post, as well as Treat#Post are time-invariant variables. Will they be excluded from the model if Will uses fixed effects?

--------------------
(Stata 15.1 MP)
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17704

22 Jun 2018, 07:56

Linh:
yes, as you can see from the following toy-example:

Code:

use "http://www.stata-press.com/data/r15/nlswork.dta"
. xtreg ln_wage i.race##i.birth_yr, fe
note: 2.race omitted because of collinearity
note: 3.race omitted because of collinearity
note: 42.birth_yr omitted because of collinearity
note: 43.birth_yr omitted because of collinearity
note: 44.birth_yr omitted because of collinearity
note: 45.birth_yr omitted because of collinearity
note: 46.birth_yr omitted because of collinearity
note: 47.birth_yr omitted because of collinearity
note: 48.birth_yr omitted because of collinearity
note: 49.birth_yr omitted because of collinearity
note: 50.birth_yr omitted because of collinearity
note: 51.birth_yr omitted because of collinearity
note: 52.birth_yr omitted because of collinearity
note: 53.birth_yr omitted because of collinearity
note: 54.birth_yr omitted because of collinearity
note: 1b.race#54.birth_yr identifies no observations in the sample
note: 2.race#42.birth_yr omitted because of collinearity
note: 2.race#43.birth_yr omitted because of collinearity
note: 2.race#44.birth_yr omitted because of collinearity
note: 2.race#45.birth_yr omitted because of collinearity
note: 2.race#46.birth_yr omitted because of collinearity
note: 2.race#47.birth_yr omitted because of collinearity
note: 2.race#48.birth_yr omitted because of collinearity
note: 2.race#49.birth_yr omitted because of collinearity
note: 2.race#50.birth_yr omitted because of collinearity
note: 2.race#51.birth_yr omitted because of collinearity
note: 2.race#52.birth_yr omitted because of collinearity
note: 2.race#53.birth_yr omitted because of collinearity
note: 2.race#54.birth_yr omitted because of collinearity
note: 3.race#41b.birth_yr identifies no observations in the sample
note: 3.race#42.birth_yr identifies no observations in the sample
note: 3.race#43.birth_yr omitted because of collinearity
note: 3.race#44.birth_yr omitted because of collinearity
note: 3.race#45.birth_yr omitted because of collinearity
note: 3.race#46.birth_yr omitted because of collinearity
note: 3.race#47.birth_yr omitted because of collinearity
note: 3.race#48.birth_yr omitted because of collinearity
note: 3.race#49.birth_yr omitted because of collinearity
note: 3.race#50.birth_yr omitted because of collinearity
note: 3.race#51.birth_yr omitted because of collinearity
note: 3.race#52.birth_yr omitted because of collinearity
note: 3.race#53.birth_yr omitted because of collinearity
note: 3.race#54.birth_yr identifies no observations in the sample

Fixed-effects (within) regression               Number of obs     =     28,534
Group variable: idcode                          Number of groups  =      4,711

R-sq:                                           Obs per group:
     within  = 0.0000                                         min =          1
     between = 0.0050                                         avg =        6.1
     overall =      .                                         max =         15

                                                F(0,23823)        =       0.00
corr(u_i, Xb)  =      .                         Prob > F          =          .

-------------------------------------------------------------------------------
      ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
         race |
       black  |          0  (omitted)
       other  |          0  (omitted)
              |
     birth_yr |
          42  |          0  (omitted)
          43  |          0  (omitted)
          44  |          0  (omitted)
          45  |          0  (omitted)
          46  |          0  (omitted)
          47  |          0  (omitted)
          48  |          0  (omitted)
          49  |          0  (omitted)
          50  |          0  (omitted)
          51  |          0  (omitted)
          52  |          0  (omitted)
          53  |          0  (omitted)
          54  |          0  (omitted)
              |
race#birth_yr |
    white#54  |          0  (empty)
    black#42  |          0  (omitted)
    black#43  |          0  (omitted)
    black#44  |          0  (omitted)
    black#45  |          0  (omitted)
    black#46  |          0  (omitted)
    black#47  |          0  (omitted)
    black#48  |          0  (omitted)
    black#49  |          0  (omitted)
    black#50  |          0  (omitted)
    black#51  |          0  (omitted)
    black#52  |          0  (omitted)
    black#53  |          0  (omitted)
    black#54  |          0  (omitted)
    other#41  |          0  (empty)
    other#42  |          0  (empty)
    other#43  |          0  (omitted)
    other#44  |          0  (omitted)
    other#45  |          0  (omitted)
    other#46  |          0  (omitted)
    other#47  |          0  (omitted)
    other#48  |          0  (omitted)
    other#49  |          0  (omitted)
    other#50  |          0  (omitted)
    other#51  |          0  (omitted)
    other#52  |          0  (omitted)
    other#53  |          0  (omitted)
    other#54  |          0  (empty)
              |
        _cons |   1.674907   .0018961   883.35   0.000     1.671191    1.678624
--------------+----------------------------------------------------------------
      sigma_u |  .42456905
      sigma_e |  .32028665
          rho |  .63731204   (fraction of variance due to u_i)
-------------------------------------------------------------------------------
F test that all u_i=0: F(4710, 23823) = 8.44                 Prob > F = 0.0000

Kind regards,
Carlo
(Stata 19.0)

Comment

Asha Nair

Join Date: Nov 2018
Posts: 5

07 Nov 2018, 22:44

Can somebody help me interpret the stata results as below , d1 is my time variable, d2 is the treatment, d1d2 is the interaction.

xtreg $ylist $xlist, re
Random-effects GLS regression	Number of obs =	3726
Group variable: id	Number of groups =	414
R-sq: within = 0.0685	Obs per group: min =	9
between = 0.5135	avg =	9.0
overall = 0.1334	max =	9
Random effects u_i ~ Gaussian	Wald chi2(10) =	571.64
corr(u_i, X) = 0 (assumed)	Prob > chi2 =	0.0000

csri Coef. Std. Err.	z P>z [95% Conf.	Interval]

adi -.0010732 .000624	-1.72 0.085 -.0022962	.0001497
rdi .0043439 .0061877	0.70 0.483 -.0077838	.0164717
fs2 -.0009373 .0004181	-2.24 0.025 -.0017567	-.0001179
sr 9.23e-10 3.68e-10	2.51 0.012 2.02e-10	1.64e-09
d1 -.0045756 .0019018	-2.41 0.016 -.0083031	-.0008481
lroa .0000163 8.79e-07	18.51 0.000 .0000146	.000018
aan .0001739 .0002009	0.87 0.387 -.0002198	.0005675
sqswti .4564973 .0655897	6.96 0.000 .3279438	.5850508
d2 -.0099561 .001486	-6.70 0.000 -.0128686	-.0070436
d1d2 .0066732 .0019774	3.37 0.001 .0027977	.0105487
_cons .0124624 .0021619	5.76 0.000 .0082252	.0166996

sigma_u 0
sigma_e .01577453
rho 0 (fraction	of variance due to u_i)

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30069
#6

07 Nov 2018, 22:49

Rather than interpret this output, which requires some calculations that are easy to get wrong, go back and re-do the regression using factor variable notation instead of using your calculated interaction term d1d2. See -help fvvarlist-. Then run

Code:

margins d1#d2 // EXPECTED VALUE OF CSRI IN EACH GROUP PRE- AND POST- margins d1, dydx(d2) // MARGINAL EFFECT OF TREATMENT IN EACH TIME PERIOD

The clearest explanation of the margins command is the excellent Richard Williams' https://www3.nd.edu/~rwilliam/stats/Margins01.pdf. It contains several worked examples, including some interaction models similar to yours.

In the event you need assistance with that, when showing the output, please put it between code delimiters so it will be more readable and better aligned. See Forum FAQ #12 for instructions on using code delimiters if you are not familiar with them.
1 like
Comment

Asha Nair

Join Date: Nov 2018
Posts: 5

08 Nov 2018, 00:13

Hi thanks for your input. I am pasting the output of margins below. My outcome variable is CSR intensity, which is basically CSR spend /sales of previous year. Am studying a policy that came in the year 2013 mandating CSR spend, so considering companies who were spending on CSR even before the policy as my control grp (the unaffected grp) and those who started spnding on CSR only after the policy as treatment group (the affected group) . d1 is the time variable, 0 before 2013 and 1 after 2013, d2 is the treatment variable 0 for control and 1 treatment . i have data from 2010 to 2018. Since am new to stata and specifically to DID technique, finding it difficult to interpret. Your help wd b valuable.

Code:

 margins d1#d2

Predictive margins                                Number of obs   =       3726
Model VCE    : Conventional

Expression   : Linear prediction, predict()

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       d1#d2 |
        0 0  |   .0097701   .1694388     0.06   0.954    -.3223238    .3418641
        0 1  |   .0005745   .1694388     0.00   0.997    -.3315194    .3326685
        1 0  |    .005136   .1694388     0.03   0.976    -.3269579    .3372299
        1 1  |    .002192   .1678176     0.01   0.990    -.3267245    .3311085
------------------------------------------------------------------------------

margins d1 , dydx(d2)

Average marginal effects                          Number of obs   =       3726
Model VCE    : Conventional

Expression   : Linear prediction, predict()
dy/dx w.r.t. : 1.d2

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.d2         |
          d1 |
          0  |  -.0091956   1.50e-11 -6.1e+08   0.000    -.0091956   -.0091956
          1  |   -.002944    .002066    -1.42   0.154    -.0069933    .0011053
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

Thanks and Regards,
Asha

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30069
#8

08 Nov 2018, 09:45

The first table gives the expected values of CSR intensity in the four conditions: control-before, control-after, treatment-before, and treatment-after. For example, in your results, the expected CSR intensity in the control group before 2013 was 0.0097701, with a 95% CI from -.3223238 to .3418641. For the control group after 2013 the expected CSR intensity was 0.005136, with a 95% CI from -.3269579 to .3372299. The second and fourth rows of that table show the corresponding values for the treatment group.

The second table shows the differences between the treatment and control groups in the before and after periods. So, before 2013, the difference between treatment and control groups was -.0091956, 95% CI -.0091956 to -.0091956. And after 2013 the difference between treatment and control groups was -.002944 95% CI -.0069933 to .0011053. The negative signs here mean that the CSRI intensity was lower in the treatment group than in the control group in both periods (though in the treatment period we are not so sure of that because the confidence interval extends up to positive numbers.)

The DID estimator of the treatment effect is not directly shown in the -margins- output. Instead, you can read that from the regression output: it is the coefficient of 1.d1#1.d2 in that table.
Comment
Asha Nair

Join Date: Nov 2018

Posts: 5
#9

09 Nov 2018, 03:33

Dear Clyde,

Thanks getting the picture clear from your inputs, i have got .0062516 as the coefficient of 1.d1#1.d2, what does this signify?

Asha
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30069
#10

09 Nov 2018, 08:28

It means that the change in CSR intensity from before to after 2013 was 0.0062516 greater in the treatment group than the change in the control group at the same time was. That is the DID estimator of the effect of the intervention on CSR intensity.

One way to understand it is that we look at how CSR intensity changed in the intervention group after 2013. Part of that change is due to the policy intervention. But some of it would have happened anyway. Our best way to estimate how much would have happened anyway is to see what did happen in the control group. So we subtract the observed change in the control group from the observed change in the group affected by the policy and we attribute that difference to the intervention. That's the theory behind DID estimation. The coefficient of the 1.d1#1.d2 interaction term actually makes that calculation.
1 like
Comment
Asha Nair

Join Date: Nov 2018

Posts: 5
#11

10 Nov 2018, 19:54

thanks a lot for your assistance. my concepts are clear on DID. Can you share any published article that uses DID for data analysis?

Thanks and Regards,
Asha
1 like
Comment
Asha Nair

Join Date: Nov 2018

Posts: 5
#12

24 Nov 2018, 03:27

Can somebody help me in plotting the trend graph for control and treatment group for DID estimation in STATA? Thank you in advance
Asha
Comment
Adil Saleem

Join Date: Oct 2020

Posts: 6
#13

01 Oct 2020, 15:43

Hello everyone, i have a panel data of more than 150 countries for 20 years. I have some observed variables to study based on DID estimation. Out of 150 countries there are 60 countries who have implemented a new policy over time. While the rest would be considered as control group. In particular, the treatment was made in different years in different counties. Some countries started implementing new policy in early 2000 and by 2008 almost all the treatment group have changed the policy. I am not sure what "time" dummy shoiuld i create for this. I am new to research work and i am not sure whether we create one year when the treatment was implemented to the treatment group. Please suggest can i use 2008 as the year when the treatment was made? Although the year of intervention was different...,
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30069
#14

01 Oct 2020, 17:49

If I understand your explanation correctly, there is no single year in which the policy was started at all 60 countries: it was a variable process with some starting as early as 2000 and others as late as 2008. So this setup is not compatible with classical DID estimation. Rather you have to use generalized DID here. The key variable you need to set up is a dichotomous variable that is 1 when the year is equal to or later than the year in which the country began implementing the new policy, and 0 otherwise. That means that, among other things, it is also 0 in all observations of the control group. For the intervention group, it is 0 before the particular year in which the particular country implemented the policy, but 1 in all years after that.

You use that variable as a predictor in your fixed-effects regression. The model must also include country and individual year fixed effects. Also include whatever other time-varying covariates might be appropriate based on the substance of the problem. Note that in this model there is no variable that indicates intervention vs control group, and there is no variable that indicates pre- vs post- intervention.
Comment
Adil Saleem

Join Date: Oct 2020

Posts: 6
#15

02 Oct 2020, 01:20

Clyde Schechter Thank you for the reply. I understand the explanation you made for the data i have, but the last line creates some confusion. "Note that in this model there is no variable that indicates intervention vs control group, and there is no variable that indicates pre- vs post- intervention".

What does the above line means? whether it means that the model would not able to capture the effects of intervention among treatment group in pre and post period? Does it supposed to mean that the real essence of DID could not be achieved with this model using Dichotomous variable?
Comment

Announcement

Difference-in-Difference with Panel Data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment