Interpreting Difference in Difference Regression Results

Nick Llywd

Join Date: Oct 2020
Posts: 19

Interpreting Difference in Difference Regression Results

26 Oct 2020, 04:48

First of all apologies that I am a novice statistician.
My data set is a national panel survey from 2013-2019
I am trying to determine if a pension reform in 2016 results in lower working hours
reform targets 20-30hr work week, firm size 500+ workers

ln_hr is the outcome variable log of weekly hrs

the did variables are the treatment group x treatment year:

did = actual treatment group x treatment year

In addition I added the did effect for 2 closely related groups that were not affected
did2= firm size below 500 x treated year
did3= over 30hr work week x treated year

age groups: 1 (young), 2 (prime), 3 (old)
sex=1 male,
married=0 (unmarried), 1(married)

The thing is, I'm confused on how I should interpret my regression results

I've run this first on prime males then I try with female and other age groups:

Code:

. xtreg ln_hrs did did2 did3 married age age_sq if sex==1 & agegroup==2,
note: married omitted because of collinearity

Fixed-effects (within) regression               Number of obs     =  1,413,097
Group variable: pid2                            Number of groups  =    506,653

R-sq:                                           Obs per group:
within  = 0.0484                                         min =          1
between = 0.0126                                         avg =        2.8
overall = 0.0144                                         max =          8

F(5,906439)       =    9227.91
corr(u_i, Xb)  = -0.3694                        Prob > F          =     0.0000


ln_hrs       Coef.   Std. Err.      t    P>t     [95% Conf. Interval]

did   -.3918229   .0031649  -123.80   0.000    -.3980259   -.3856199
did2   -.2259312   .0024079   -93.83   0.000    -.2306506   -.2212118
did3    .1183826   .0010739   110.24   0.000     .1162778    .1204873
married           0  (omitted)
age   -.0053248   .0019617    -2.71   0.007    -.0091697   -.0014799
age_sq   -.0000975   .0000219    -4.44   0.000    -.0001405   -.0000545
_cons     4.18519   .0436677    95.84   0.000     4.099603    4.270778

sigma_u   .30998708
sigma_e   .20286125
rho   .70015095   (fraction of variance due to u_i)

F test that all u_i=0: F(506652, 906439) = 4.51              Prob > F = 0.0000

Code:

. estimates table male_prime    male_old male_young female_old female_young, stats(N    r2
        
        
Variable  male_prime    male_old    male_young  female_old   female_y~g      
        
did   -.3918229    .20869769    .14312667    .02250853    .16162787      
did2  -.22593119    -.0212681    .18365077    .13229586     .18334501      
did3   .11838256    .23876996    .29431827    .28468336    .32835838      
married   (omitted)    (omitted)    (omitted)    (omitted)    (omitted)      
age  -.00532479    .08906627   -.00525672    .00457667    .48950869      
age_sq  -.00009747    .00039557    .00070005   -.00021192   -.01037898      
_cons   4.1851905    7.5631666    3.1662467    3.5781754   -2.4026562      
        
N     1413097    198487       116533      1136866      113062      
r2   .04843646    .05134639    .04625304    .04583826    .05474549      
r2_a  -.48344304    .53456717   -.71746081   -.51141495   -.69389812

My first question is how I am supposed to interpret each of the did coefficients?
The null hypothesis is that there is no difference between the experimental and control groups and I don't know if I can reject it here.

Tags: None

Al Perez

Join Date: Oct 2020

Posts: 10
#2

29 Oct 2020, 16:31

The problem with your regression, as presented, is that it doesn't include time and treatment indicator variables. Thus the first difference is never calculated. The other did variables don't really do what you want them to do. I think we should first think about what you're doing conceptually. I see three differences you may want to exploit:
Pre treatment period v. post treatment period

Firms with workers => 500 v. firms with workers < 500

Workers who work 20-30 hours during a week v. workers who work more or less hours a week

I think I first want to ensure I have the correct understanding of treatment eligibility. Can you tell me a bit more about the pension reform? Do workers who worked 20-30 hours a week in 2015 lose the benefit of pension reform if they worked more than 30 hours a week in 2017, and do workers who decreased their hours from over 30 in 2015 to 20-30 hours in 2017 gain the benefit? Do workers in firms who had over 500 workers in 2015 lose the benefit of pension reform if number of workers in the firm drops below 500? Answers to these questions will help ensure we're setting up a proper analysis. Let's for now assume eligibility for the pension reform benefit does indeed change if hours worked or number of firm workers change. Your research question suggests eligibility does indeed change if these variables change.

It looks like you have individual-level data, so I would first generate the following variables to help with easier analysis:
post = 1 if year => 2016, assuming 2016 is the first year of true treatment, 0 otherwise

treat = 1 if workers in firm => 500, 0 otherwise

pension = 1 if post = 1 & treat == 1, 0 otherwise

group_1 = 1 if average weekly hours under 20, 0 otherwise

group_2 = 1 if average weekly hours between 20-30, 0 otherwise

group_3 = 1 if average weekly hours over 30, 0 otherwise

Now, a problem I see is that hours worked is both the outcome variable and a determination for eligibility. But I assume the survey data is not longitudinal, as most survey data are not, so this shouldn't really be a problem. Unfortunately, you're not able to estimate changes on an individual level, meaning you're not really able to exploit the third difference I mentioned above. What you're really estimating is the change in the distribution of hours worked by firm.

Next, you want to determine which groups you actually want to compare, i.e. what equations you want to estimate. It seems three questions you probably want to answer are:
Does pension reform induce workers who worked less than 20 hours to work between 20-30 hours?

Does pension reform induce workers who worked between 20-30 hours to work more or less hours?

Does pension reform induce workers who worked more than 20 hours to work between 20-30 hours

These three questions ask about likely scenarios induced by the pension reform, at least in my opinion. These questions can be answered indirectly by asking the following:
Does pension reform change the distribution of workers who work between 20-30 hours?

Does pension reform change the distribution of workers who work less than 20 hours?

Does pension reform change the distribution of workers who work more than 30 hours?

I would run the following equations:

Code:

reg ln_hrs pension i.year i.firm married age age_sq, cluster(firm) reg group_1 pension i.year i.firm married age age_sq, cluster(firm) reg group_2 pension i.year i.firm married age age_sq, cluster(firm) reg group_3 pension i.year i.firm married age age_sq, cluster(firm)

The coefficient for pension in the first equation will give you the average treatment effect (ATE) on hours worked at firms with over 500 workers after 2016, ie the ATE of the policy. Note that this is the average effect for all workers and average effect across all years. I would caution against giving this estimate too much attention, since the distribution of worker hours before 2016 will effect the estimate. The other equations, in my opinion, are more relevant. For example, the third equation will give you the change in the probability that a worker works between 20-30 hours. We would expect this effect to be positive, since workers in the other groups will likely change their hours to be classified in this group.

You can then look at different subgroups by using the if condition, as you did in your original regression.
Comment

Nick Llywd

Join Date: Oct 2020
Posts: 19

01 Nov 2020, 19:04

Thanks for your kind response Al

Originally posted by Al Perez View Post

think I first want to ensure I have the correct understanding of treatment eligibility. Can you tell me a bit more about the pension reform?.

The reform creates a labor cost on firms with 500+ workers. Before the reform, firms that had 20-30hr workers did not have to contribute to pension, after the reform they do.
For those firms having workers who work above 30hrs, they would have to contribute to the pension regardless of the reform. Firms with under 20hr work weeks would not have to contribute regardless of the reform.

Originally posted by Al Perez View Post

Now, a problem I see is that hours worked is both the outcome variable and a determination for eligibility. But I assume the survey data is not longitudinal, as most survey data are not, so this shouldn't really be a problem. Unfortunately, you're not able to estimate changes on an individual level, meaning you're not really able to exploit the third difference I mentioned above. What you're really estimating is the change in the distribution of hours worked by firm.
.

In the below regressions, I dropped observations where workers are not working 20-30hrs in the below regressions. This sample will include those smaller firms that don't receive the reform effects only.
This captures the treatment and control, but perhaps using a dummy and keeping the entire sample the correct way to be testing multiple control groups?

As for the variables, I noticed my mistake about omitting the treated variable and inserted it as below:

Code:

// Treatment time (Reform occurs in Oct.2016)
gen time = YYYYMM > 201609

// Treated group: 500ppl+, Wkhrs 20-30, months after reform
gen treated = 0
replace treated = 1 if inrange(size, 7, 8) &inrange(WorkingTime_Week,20,30) & time == 1

// Interaction effects
gen did = treated*time

Below is the estimate table for xtreg fixed effects by various age/gender groups. Also the date 637....686 are interaction effects for month, not year. The surveys are monthly.

Code:

. //Estimates table
. estimates table male_young_fe male_prime_fe male_old_fe female_young_fe female_prime_fe female_old_fe, stats(N r2
>  r2_a) star(.05 .01 .001)


Variable  male_young_fe   male_prime_fe    male_old_fe    female_youn~e   female_prim~e   female_old_fe  

did  -.06335122        .0086689      -.00799071       .06138662*      .00084008        .0174197    
time   .22508194*     -.00542874      -.04402266      -.01567902       .03123864*     -.04721227    
treated   (omitted)       (omitted)       (omitted)       (omitted)       (omitted)       (omitted)    
married   (omitted)       (omitted)       (omitted)       (omitted)       (omitted)       (omitted)    
age  -.03127932        .0391765**     .08770109      -.03387792       .01774397***   -.00955728    
age_sq   .00002882       -.0004116**    -.00067633       .00078641      -.00023503***    .00014602    

date
637    .01483028       .01949263       .00399551      -.00099859       .00238829      -.00510992    
638    .03403875       .02771634      -.01151852      -.02011524       .00625102      -.00833993    
639   -.02580148       .03007762      -.02549073      -.03097651      -.01146925      -.02309226    
640   -.01431036        .0246344      -.02359891       -.0041245       .01384798*       .0111537    
641   -.02572707       .02472665      -.03102807      -.02158605       .01634536*      .01429675    
642    .00954388       .02686355      -.04575085      -.02239645       .02272778***    .00577666    
643   -.01891511       .02335591      -.04093955      -.01298043       .01239421      -.01540079    
644    .03472182       .02685713      -.00586405       .02374297       .01478453*      -.0022926    
645    .11040312*      .03369939      -.02682956       .02934718       .01539201*     -.00962696    
646    .06443077       .03134361      -.03855582       .02067497        .0144461*     -.01199479    
647    .04957055      -.00230938      -.03353541      -.02115902      -.01849672**    -.05159455***  
648     .0334699       .00984061      -.01664924        -.014547       .00332664      -.02364399    
649    .02876649       .02880876      -.01013813       .00094939       .01673821**    -.01237988    
650    .05032143       .01761055      -.00742823      -.00388799       .01863912**    -.01618815    
651    .03615941      -.02392381      -.03364384      -.07533966      -.01128259      -.02731352    
652    .06455091       .02074589      -.01933612      -.03304385       .01537385*     -.00666463    
653    .08107249       .03256231       -.0214189      -.03234303       .02378094***   -.00409894    
654    .08176343       .03175518      -.04180578      -.02246648       .02109646**    -.01811314    
655    .08597821       .03399713      -.04521471      -.01300255       .02500304***    -.0139565    
656     .0878044       .03562141      -.03353633      -.01867719       .02203472**    -.01601315    
657    .13895938*      .02471879      -.03275726      -.00687914       .01776976*     -.01632118    
658     .0627716       .01426089      -.05238772       -.0273544      -.00289779      -.04218746*    
659     .0720458      -.01229376      -.02587505      -.04661133       -.0081918      -.03821582*    
660    .07567278      -.00247532      -.02885674      -.02220233       .01530281*     -.04039775*    
661    .08353853       .00330771      -.03527273      -.02366675       .01568109*      -.0318076    
662    .07913422       .00745052      -.02141916      -.01237999       .02393561**    -.01905583    
663    .03264014       -.0031191      -.04739153      -.02580379       .00547008      -.03570722    
664    .10077923       .00960485      -.03280885       .02072874       .02581338**    -.00934814    
665    .09135764       .01477348      -.04995088       .00737557       .03469085***   -.00970304    
666    .10109859       .02068301      -.05522597       .04144747       .02596652**    -.02039472    
667    .11747367       .01133101      -.05860103      -.01082672       .02596772**    -.02976636    
668    .14102251*      .01153634      -.05987687      -.00711932       .03139359***   -.02939528    
669     .1174958        .0316789      -.05691928      -.00686122       .02457669**     -.0294494    
670    .14336493*      .01100707      -.05842497       .02989413       .03055206***   -.02543143    
671    .08133769      -.01286802      -.05056241      -.00100604       .00227917      -.03429375    
672    .04480609      -.01992689      -.04320097      -.00423017       .02339529*     -.02895825    
673    .09959497      -.00715016      -.02696779       .02974202       .03003979**    -.03083052    
674    .10328407       -.0187076      -.01133239      -.00770488         .031975**     -.0342153    
675    .11322101      -.00430784      -.05301695      -.02928651       .00619044      -.04486488    
676    .14966392      -.00119473      -.04202775      -.02660858       .04003175***   -.00783868    
677    .14549956       .01033178      -.03824544      -.01927022       .03000595**    -.01424443    
678    .17106471*       .0224899      -.03253182      -.02739416       .03164351**    -.02192722    
679    .22612895**      .0193148      -.04788885       .00801814       .02923926**    -.04351454    
680     .1980548*      .01735297      -.08805466*      .00724638       .03015235**    -.04965386    
681   -.01252625       .02004082      -.03593336       .02503789      -.00144887       .00640349    
682   -.06281346      -.00212646       -.0333872       .03531413       .00455308       .00929459    
683   -.06547609      -.01283606      -.05103184*      .01046673       -.0240731***   -.00790723    
684   -.03396279       .00510708      -.01123893      -.03272397         .007411       .00930132    
685   -.07441255*      .01005145       .00659603       -.0405325       .00346545      -.00120751    
686    (omitted)       (omitted)       (omitted)       (omitted)       (omitted)       (omitted)    

_cons   3.8244254**     2.3634916***    .43319741       3.5999509**     2.8816312***    3.2514334    

N        3934           30857           11857            5781          141823           22393    
r2   .06413887       .01436512       .01592592       .03837894       .00911273       .00909293    
r2_a    -2.61566      -3.1960196      -2.0108857      -2.4957042      -1.2594275      -1.3467362    

legend: * p<.05; ** p<.01; *** p<.001

Originally posted by Al Perez View Post

I would run the following equations:

reg ln_hrs pension i.year i.firm married age age_sq, cluster(firm)
reg group_1 pension i.year i.firm married age age_sq, cluster(firm)
reg group_2 pension i.year i.firm married age age_sq, cluster(firm)
reg group_3 pension i.year i.firm married age age_sq, cluster(firm)

The coefficient for pension in the first equation will give you the average treatment effect (ATE) on hours worked at firms with over 500 workers after 2016, ie the ATE of the policy. Note that this is the average effect for all workers and average effect across all years. I would caution against giving this estimate too much attention, since the distribution of worker hours before 2016 will effect the estimate. The other equations, in my opinion, are more relevant. For example, the third equation will give you the change in the probability that a worker works between 20-30 hours. We would expect this effect to be positive, since workers in the other groups will likely change their hours to be classified in this group.

You can then look at different subgroups by using the if condition, as you did in your original regression.

I am not sure I understand what you have done. It seems you are proposing instead of changing the sample size each time to test a new control, that I instead make groups for wrk hrs, and set the group as the outcome variable.
So in this case, I see the effect of reform on inducing individuals to change to a different work group, rather than an incremental change in work hours itself?

Thank you

Last edited by Nick Llywd; 01 Nov 2020, 19:08.

Comment

Nick Llywd

Join Date: Oct 2020

Posts: 19
#4

01 Nov 2020, 20:57

One more thing I forgot to mention,

But I assume the survey data is not longitudinal, as most survey data are not, so this shouldn't really be a problem. Unfortunately, you're not able to estimate changes on an individual level, meaning you're not really able to exploit the third difference I mentioned above. What you're really estimating is the change in the distribution of hours worked by firm.

The data are longitudinal, with up to 8 observations per individual, it is an unbalanced panel data set.

Last edited by Nick Llywd; 01 Nov 2020, 21:01.
Comment

Announcement

Interpreting Difference in Difference Regression Results

Comment

Comment

Comment