Difference-in-Difference with Panel Data

Clyde Schechter

Join Date: Apr 2014

Posts: 30164
#16

02 Oct 2020, 14:11

I'm sorry if my wording confused you. Let's go back a step.

In the classical DID analysis, there is a variable that encodes (0/1) intervention vs control group, another variable that encode (0/1) pre- vs post intervention time period. Both of these variables and their interaction term occur in the regression for a classical DID. In the generalized DID, these variables are not in the regression. Instead there is only a single variable that is 1 for observations that are both in the intervention group and occur in the post-intervention (for that unit) period, and 0 everywhere else. This variable is, if you like, analogous to the interaction term in the classical DID--but there are no corresponding "main" effects to include.

Another key difference between generalized and classical DID is that in the latter, it may not be necessary to include fixed effects for unit and time (although one might do so anyway for other reasons), but in the generalized DID they are absolutely required.

The generalized DID does capture the effect of intervention on the treatment group provided the same assumptions required for classical DID to identify the treatment effect are met.
3 likes
Comment
Adil Saleem

Join Date: Oct 2020

Posts: 6
#17

03 Oct 2020, 01:05

Clyde Schechter Thank you for the clarification.
Comment
Adil Saleem

Join Date: Oct 2020

Posts: 6
#18

06 Oct 2020, 05:00

Dear Clyde,
I hope you are doing great. I run the regression by creating an Interaction variable like you suggested me above. Considering the condition (single variable that is 1 for observations that are both in the intervention group and occur in the post-intervention (for that unit) period, and 0 everywhere else). I reduced the treatment group to 30 countries and 110 countries for control in order to have a balanced panel data set. Would this effect the results?

I am attaching the results from the DID regression but i am not very much satisfied with the R Sq value, which is too low.

. xtreg gdpgrowthannual did, fe

Fixed-effects (within) regression Number of obs = 4464
Group variable: id Number of groups = 144

R-sq: within = 0.0014 Obs per group: min = 31
between = 0.0031 avg = 31.0
overall = 0.0013 max = 31

F(1,4319) = 6.02
corr(u_i, Xb) = -0.0470 Prob > F = 0.0142

------------------------------------------------------------------------------
gdpgrowtha~l | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
did | 1.041065 .4242073 2.45 0.014 .2094009 1.872729
_cons | 3.127226 .0899819 34.75 0.000 2.950815 3.303637
-------------+----------------------------------------------------------------
sigma_u | 1.6540493
sigma_e | 5.5393767
rho | .08186213 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(143, 4319) = 2.76 Prob > F = 0.0000

My second question about this, how can i meet the assumption to run generalized DID, which is the trends of both the groups are same before the treatment?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30164
#19

06 Oct 2020, 10:16

I reduced the treatment group to 30 countries and 110 countries for control in order to have a balanced panel data set. Would this effect the results?

Yes, it can affect the results. There is no reason to do this--there are no advantages to having a balanced data set in this kind of analysis. And by removing some of the data, you are now studying a sample that may well be biased.

I am attaching the results from the DID regression but i am not very much satisfied with the R Sq value, which is too low.

Low R² is not necessarily a problem. If the data are very noisy, then obtaining a high R² is simply not possible. But that doesn't alter the validity of estimating the treatment effect. The two issues are unrelated.

My second question about this, how can i meet the assumption to run generalized DID, which is the trends of both the groups are same before the treatment?

My preferred way to do this is to calculate the mean value of the outcome group in both the treatment group (i.e. those that eventually get the intervention) and the control group in every year before the intervention, and then graph them.
So something like

Code:

collapse (mean) outcome_variable if did == 0, by(year treatment_vs_control) reshape wide outcome_variable, i(year) j(treatment_vs_control) graph twoway connect outcome_variable* year

Then you can literally see to what extent the trends look parallel.
2 likes
Comment

lal mohan kumar

Join Date: May 2019
Posts: 265

#20

11 Jan 2021, 04:01

Dear all
I plan to check how a certain law(SARFAESI) has affected the debt position of firms in a country. I haven't used DID specification so far and my little understanding is based on this forum, especially https://www.statalist.org/forums/for...in-differences.

Law is coded as 1(the law is in place) when years are 2002 2003 2004 & 0 if years are 1997 1998 1999 2000 2001(before passing the law). Hence time part is defined.
Since the passage of law affected all firms, I don't have a natural treatment and control groups, hence I followed the literature and classified firms into treated and controlled based on the firm's tangible assets(tangibility). Hence all the firms that fall in the bottom tercile form my treatment group and the top decile is the control group. For such classification, I used pre-law enforcement years;-1998,1999,2000

My panel ID is firm(denoted by ccode) & I set my panel as

Code:

xtset ccode year

For the time part, I coded

Code:

gen sarfaesi=.
replace sarfaesi=0 if year>1996 & year<2002 //denoting before law period
replace sarfaesi=1 if year>2001 & year<2005 //denoting after/during law period

For the group part, first I created a period from 1998-2000 as this denotes pre-law enforcement years

Code:

gen treat_years=.
replace treat_years=1 if year>1997 & year<2001  // treatment period
replace treat_years=0 if year<1998 | year> 2000 // we will NOT consider this period but simply coded

Then, I classified firms into treated(high tangible) & control(low tangible) groups for the treatment period from 1998-2000

Code:

egen tertiles=xtile(tang1_w), n(3) by(treat_years)

gen tang_group=.
replace tang_group=1 if treat_years==1 & tertiles==3 // high tangible groups or treatment group
replace tang_group=0 if treat_years==1 & tertiles==1 // low tangible groups or control group

My outcome variables is secured borrowings(secborr_ta_w) & I ran the following regression

Code:

. xtreg secborr_ta_w i.sarfaesi##i.tang_group i.year ,fe vce(robust)
note: 0.sarfaesi omitted because of collinearity
note: 1.tang_group omitted because of collinearity
note: 0.sarfaesi#1.tang_group omitted because of collinearity
note: 1997.year omitted because of collinearity

Fixed-effects (within) regression               Number of obs     =      2,587
Group variable: ccode                           Number of groups  =      2,587

R-sq:                                           Obs per group:
     within  =      .                                         min =          1
     between =      .                                         avg =        1.0
     overall =      .                                         max =          1

                                                F(0,2586)         =          .
corr(u_i, Xb)  =      .                         Prob > F          =          .

                                     (Std. Err. adjusted for 2,587 clusters in ccode)
-------------------------------------------------------------------------------------
                    |               Robust
       secborr_ta_w |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------------+----------------------------------------------------------------
         0.sarfaesi |          0  (omitted)
       1.tang_group |          0  (omitted)
                    |
sarfaesi#tang_group |
               0 1  |          0  (omitted)
                    |
               year |
              1997  |          0  (omitted)
                    |
              _cons |   .3474875          .        .       .            .           .
--------------------+----------------------------------------------------------------
            sigma_u |  .29553325
            sigma_e |          .
                rho |          .   (fraction of variance due to u_i)
-------------------------------------------------------------------------------------

.
If I ran the random effects specifications,

.

Code:

 xtreg secborr_ta_w i.sarfaesi##i.tang_group i.year ,re vce(robust)
note: 0.sarfaesi omitted because of collinearity
note: 0.sarfaesi#1.tang_group omitted because of collinearity
note: 1997.year omitted because of collinearity
insufficient observations
r(2001);

In thIS same post Carlo Lazzaro #4 has demonstrated a similar one. But my question is
1) Given the above setting, how to estimate the above regression without dropping time##group(i.sarfaesi##i.tang_group), since I have seen many paper using similar specification with firms fixed effects.
2) Is there anything fundamentally wrong in my codes, logic, treatment period that resulted in dropping observations. Sorry for flagging Clyde Schechter, as I have relied on some of your writing on this topic and if I misunderstood them, I would like to correct it

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17730
#21

11 Jan 2021, 07:54

Ial:
focusing on your last post, if you have 2587 groups and 2587 observations, you do not have a panel but a cross-sectional dataset.
Thats why -xtreg- results are letting you down.

Kind regards,
Carlo
(Stata 19.0)
Comment
lal mohan kumar

Join Date: May 2019

Posts: 265
#22

11 Jan 2021, 08:47

Thanks Carlo Lazzaro. Can I ask some doubts over which I am brooding over
1)In my case, before the event years are say 1999,2000,& 2001 and after the event years are 2002,2003 & 2004(assume we have 3 years in both). The classification of groups into treatment and control is based on some criteria(assets) during the pre-event period. In my case, it was 1998,1999,2000. Hence there are overlapping issues with respect to time. Is this what leads to multicollinearity. In an ideal case how should be such group classification?
2) Also should I go for reg or xtreg as in papers I have seen using firm fixed effects and time dummies with DID specification
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17730
#23

11 Jan 2021, 10:09

Ial:
take a look at https://www.princeton.edu/~otorres/DID101R.pdf

Kind regards,
Carlo
(Stata 19.0)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30164
#24

11 Jan 2021, 12:42

Since the passage of law affected all firms, I don't have a natural treatment and control groups, hence I followed the literature and classified firms into treated and controlled based on the firm's tangible assets(tangibility). Hence all the firms that fall in the bottom tercile form my treatment group and the top decile is the control group.

Design first, analysis second. You are not going to be able to do a DID with this data. Unless the nature of the law whose effect you are trying to estimate is that it has no practical effect in firms in the top decile of assets, this designation of treatment and control groups is arbitrary and meaningless. Your "DID analysis" will instead simply be a pre-post comparison of outcomes, with the effects estimated separately in these two asset-based groups. Except, it won't even be that, because they way you have (mis)coded this the asset-based group variable isn't even defined in the pre-law period. So all you have is a division of the firms into asset groups during the treatment period. Consequently, even the weak pre-post estimate of effect has been subverted. You have nothing in this model that tells you anything whatsoever about the effect of the law. Nothing. You are fortunate that Stata's regression output made it obvious that something is wrong.
2 likes
Comment

lal mohan kumar

Join Date: May 2019
Posts: 265

#25

09 Mar 2021, 01:47

Clyde Schechter thanks for the help. I decided to learn a little bit about Diff and Diff before posting further questions. In my proposed model, there is a time dummy in which years 2014,2015, and 2016 denote before regulation and 2017,2018, and 2019 denote the period of regulation. Now in my set-up, there are no natural treated and control groups and hence based on literature as well as anecdotes, I took ownership as the basis for classification. Thus my design is to divide the sample into two groups (above the median and below median), based on firms’ average pretreatment measure (2014 to 2016) of ownership where the highest group (above median ) is my treated group and the lowest block (less than the median) is my control group. Now let show what I have done

Code:

. xtset id year
       panel variable:  id (unbalanced)
        time variable:  year, 2014 to 2019, but with gaps
                delta:  1 unit

. distinct id year

       |        Observations
       |      total   distinct
-------+----------------------
    id |      11788       2619
  year |      11788          6

****Creating Pre-reg and Post-reg period*****************************
. gen post=.                // for creating time dummy
(11,788 missing values generated)

. replace post=0 if year&gt;2013 &amp; year &lt;2017 &amp; year!=. //pre-regulation period
(5,432 real changes made)

. replace post=1 if year&gt;2016 &amp; year &lt;2020 &amp; year!=. // post-regulation period
(6,356 real changes made)



*****Creating Treatment and control group based on ownership structure*********************
*giving summary of the variable ownership (owner) for the period 2014-2019. The variable is in %
 univar owner if year &gt;2013 &amp; year &lt;2020
                                        -------------- Quantiles --------------
Variable       n     Mean     S.D.      Min      .25      Mdn      .75      Max
-------------------------------------------------------------------------------
   owner   11562    52.86    18.72     0.00    42.03    56.28    68.49   100.00


 egen owner_year=xtile(owner ) if year&gt;2013 &amp; year&lt;2017, n(2) by(year) // to classify owner into 2 categories
&gt;  based on pre-reg period
(6,458 missing values generated)


. egen mean_owner=mean(owner_year),by(id) // to get mean by id
(1,153 missing values generated)

.

. univar mean_owner
                                        -------------- Quantiles --------------
Variable       n     Mean     S.D.      Min      .25      Mdn      .75      Max
-------------------------------------------------------------------------------
mean_owner   10635     1.50     0.48     1.00     1.00     1.67     2.00     2.00
-------------------------------------------------------------------------------

*since the median is 1.67, we classify treatment group  as those firms which has owner&gt;1.67
 &amp; control group as firms with owner&lt;1.67

 gen owner_group=.
(11,788 missing values generated)

.
. replace owner_group=1 if mean_owner&gt;1.67 &amp;  mean_owner!= .              //treated group
(5,002 real changes made)

. replace owner_group=0 if mean_owner&lt;1.67 &amp; mean_owner!= .        // control group
(5,633 real changes made)

*Cross checking whether treated group has higher owner or not

 univar owner,by(owner_group)

-&gt; owner_group=0
                                        -------------- Quantiles --------------
Variable       n     Mean     S.D.      Min      .25      Mdn      .75      Max
-------------------------------------------------------------------------------
   owner    5542    39.31    15.56     0.00    29.65    42.87    50.85    94.09
-------------------------------------------------------------------------------

-&gt; owner_group=1
                                        -------------- Quantiles --------------
Variable       n     Mean     S.D.      Min      .25      Mdn      .75      Max
-------------------------------------------------------------------------------
   owner    4965    66.98     7.69     2.42    61.90    67.81    73.46    99.95
-------------------------------------------------------------------------------

*Treated group (owner_group 1) has a higher percentage of ownership than control group (owner_group 0)
*Trust everything is correct till here hence proceeding with regression

**********************Regression*********************************************
xtreg cash_ta_w i.owner_group##i.post size_w nfa_ta_w lever_w trade_credit_ta_w sales_grow_w roa_w pbfinal cfo_
&gt; ta_w rdcc2_ta_w div_ta_w nw_ta_w i.year, fe vce(robust)
note: 1.owner_group omitted because of collinearity
note: 2019.year omitted because of collinearity

Fixed-effects (within) regression               Number of obs     =      8,485
Group variable: id                              Number of groups  =      1,814

R-sq:                                           Obs per group:
     within  = 0.0810                                         min =          1
     between = 0.1039                                         avg =        4.7
     overall = 0.1015                                         max =          6

                                                F(17,1813)        =       9.34
corr(u_i, Xb)  = -0.0132                        Prob &gt; F          =     0.0000

                                      (Std. Err. adjusted for 1,814 clusters in id)
-----------------------------------------------------------------------------------
                  |               Robust
        cash_ta_w |      Coef.   Std. Err.      t    P&gt;|t|     [95% Conf. Interval]
------------------+----------------------------------------------------------------
    1.owner_group |          0  (omitted)
           1.post |  -.0036867   .0027742    -1.33   0.184    -.0091276    .0017542
                  |
owner_group#post |
             1 1  |  -.0029889   .0028072    -1.06   0.287    -.0084946    .0025168
                  |
           size_w |    .003986   .0048166     0.83   0.408    -.0054606    .0134326
         nfa_ta_w |  -.1260332   .0141154    -8.93   0.000    -.1537174   -.0983491
          lever_w |   .0513303   .0156838     3.27   0.001       .02057    .0820906
trade_credit_ta_w |  -.0764345   .0118204    -6.47   0.000    -.0996175   -.0532515
     sales_grow_w |   .0017605   .0007836     2.25   0.025     .0002236    .0032974
            roa_w |  -.0148786   .0177108    -0.84   0.401    -.0496144    .0198571
          pbfinal |   .0016718   .0007772     2.15   0.032     .0001476     .003196
         cfo_ta_w |   .0642863   .0095625     6.72   0.000     .0455316    .0830411
       rdcc2_ta_w |  -.1687321   .1958021    -0.86   0.389    -.5527536    .2152893
         div_ta_w |   .0019618   .0022745     0.86   0.389    -.0024992    .0064227
          nw_ta_w |   .0985509   .0137832     7.15   0.000     .0715183    .1255835
                  |
             year |
            2015  |   .0002953   .0014899     0.20   0.843    -.0026268    .0032174
            2016  |  -.0008791   .0016552    -0.53   0.595    -.0041253    .0023672
            2017  |   .0041283   .0018694     2.21   0.027      .000462    .0077946
            2018  |   .0024727   .0016666     1.48   0.138     -.000796    .0057415
            2019  |          0  (omitted)
                  |
            _cons |   .0050101   .0395974     0.13   0.899    -.0726512    .0826715
------------------+----------------------------------------------------------------
          sigma_u |  .08184581
          sigma_e |  .04456335
              rho |  .77133251   (fraction of variance due to u_i)
-----------------------------------------------------------------------------------

Is my research design the correct one? Also, are my codes correct? Can I say that post-regulation has no effect on the dependent variable of the treated group?
I have made sure to include all here and if anyone could help me here, it will be very important for me at this juncture?

Comment

lal mohan kumar

Join Date: May 2019

Posts: 265
#26

09 Mar 2021, 09:40

Carlo Lazzaro in your post, (https://www.statalist.org/forums/for...79#post1450179 ), you have demonsterated fixed effects wont work. However, in the above example, fixed effects are there. Is there anything wrong in the above? Is my DID specification wrong with respect to classification, commands I gave etc
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17730
#27

09 Mar 2021, 09:47

Ial:
I cannot say whether there's somethng wrong in your approach.
What I previously stated was that (as obviously expected) -fe- won't work when all predictors are time-invariant:

Code:

use "http://www.stata-press.com/data/r15/nlswork.dta" . xtreg ln_wage i.race##i.birth_yr, fe

Last edited by Carlo Lazzaro; 09 Mar 2021, 09:51.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
lal mohan kumar

Join Date: May 2019

Posts: 265
#28

09 Mar 2021, 09:50

Carlo Lazzaro Thanks for the prompt reply.

HTML Code:

-fe- won't work when all predictors are time-invariant:

. I agree unlike in my setup.
Comment
lal mohan kumar

Join Date: May 2019

Posts: 265
#29

09 Mar 2021, 11:50

Clyde Schechter. Sorry for tagging you. As I have banked upon your (as well as Carlo's )resources heavily, I dont want to miss any of your comments. Following your post #19, for parallel trend assumption check I run the following codes

Code:

gen did=post*owner_group collapse (mean) cash_ta_w if did == 0, by(year owner_group) reshape wide cash_ta_w, i(year) j(owner_group) graph twoway connect cash_ta_w* year

and I got the following graph Graph.gph
Is my assumption valid from the graph?
Attached Files

Graph.gph (5.9 KB, 1 view)

Last edited by lal mohan kumar; 09 Mar 2021, 11:54.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30164
#30

09 Mar 2021, 12:01

Concerning the model in #25, the code appears to be a correct implementation of what is described in its comments. I do have concerns about the design, however. Defining the treatment group by an average of values of ownership share being greater than the median for all firms in your sample is questionable under the best of circumstances (Googel Harrell dichotomania). It is even more concerning here: are you certain that year-on-year ownership is not affected by the intervention you are trying to study?

Concerning your parallel trends issue, it looks roughly valid. With only three years of pre-intervention data, it's a little hard to really draw a firm conclusion. But it looks fair. I would call it plausible, maybe even persuasive, but not convincing. (But almost nothing would be convincing based on just three years.)
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment