Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interpreting Difference in Difference Regression Results

    First of all apologies that I am a novice statistician.
    My data set is a national panel survey from 2013-2019
    I am trying to determine if a pension reform in 2016 results in lower working hours
    reform targets 20-30hr work week, firm size 500+ workers

    ln_hr is the outcome variable log of weekly hrs

    the did variables are the treatment group x treatment year:

    did = actual treatment group x treatment year

    In addition I added the did effect for 2 closely related groups that were not affected
    did2= firm size below 500 x treated year
    did3= over 30hr work week x treated year

    age groups: 1 (young), 2 (prime), 3 (old)
    sex=1 male,
    married=0 (unmarried), 1(married)

    The thing is, I'm confused on how I should interpret my regression results

    I've run this first on prime males then I try with female and other age groups:
    Code:
    . xtreg ln_hrs did did2 did3 married age age_sq if sex==1 & agegroup==2,
    note: married omitted because of collinearity
    
    Fixed-effects (within) regression               Number of obs     =  1,413,097
    Group variable: pid2                            Number of groups  =    506,653
    
    R-sq:                                           Obs per group:
    within  = 0.0484                                         min =          1
    between = 0.0126                                         avg =        2.8
    overall = 0.0144                                         max =          8
    
    F(5,906439)       =    9227.91
    corr(u_i, Xb)  = -0.3694                        Prob > F          =     0.0000
    
    
    ln_hrs       Coef.   Std. Err.      t    P>t     [95% Conf. Interval]
    
    did   -.3918229   .0031649  -123.80   0.000    -.3980259   -.3856199
    did2   -.2259312   .0024079   -93.83   0.000    -.2306506   -.2212118
    did3    .1183826   .0010739   110.24   0.000     .1162778    .1204873
    married           0  (omitted)
    age   -.0053248   .0019617    -2.71   0.007    -.0091697   -.0014799
    age_sq   -.0000975   .0000219    -4.44   0.000    -.0001405   -.0000545
    _cons     4.18519   .0436677    95.84   0.000     4.099603    4.270778
    
    sigma_u   .30998708
    sigma_e   .20286125
    rho   .70015095   (fraction of variance due to u_i)
    
    F test that all u_i=0: F(506652, 906439) = 4.51              Prob > F = 0.0000

    Code:
    . estimates table male_prime    male_old male_young female_old female_young, stats(N    r2
            
            
    Variable  male_prime    male_old    male_young  female_old   female_y~g      
            
    did   -.3918229    .20869769    .14312667    .02250853    .16162787      
    did2  -.22593119    -.0212681    .18365077    .13229586     .18334501      
    did3   .11838256    .23876996    .29431827    .28468336    .32835838      
    married   (omitted)    (omitted)    (omitted)    (omitted)    (omitted)      
    age  -.00532479    .08906627   -.00525672    .00457667    .48950869      
    age_sq  -.00009747    .00039557    .00070005   -.00021192   -.01037898      
    _cons   4.1851905    7.5631666    3.1662467    3.5781754   -2.4026562      
            
    N     1413097    198487       116533      1136866      113062      
    r2   .04843646    .05134639    .04625304    .04583826    .05474549      
    r2_a  -.48344304    .53456717   -.71746081   -.51141495   -.69389812
    My first question is how I am supposed to interpret each of the did coefficients?
    The null hypothesis is that there is no difference between the experimental and control groups and I don't know if I can reject it here.

  • #2
    The problem with your regression, as presented, is that it doesn't include time and treatment indicator variables. Thus the first difference is never calculated. The other did variables don't really do what you want them to do. I think we should first think about what you're doing conceptually. I see three differences you may want to exploit:
    1. Pre treatment period v. post treatment period
    2. Firms with workers => 500 v. firms with workers < 500
    3. Workers who work 20-30 hours during a week v. workers who work more or less hours a week
    I think I first want to ensure I have the correct understanding of treatment eligibility. Can you tell me a bit more about the pension reform? Do workers who worked 20-30 hours a week in 2015 lose the benefit of pension reform if they worked more than 30 hours a week in 2017, and do workers who decreased their hours from over 30 in 2015 to 20-30 hours in 2017 gain the benefit? Do workers in firms who had over 500 workers in 2015 lose the benefit of pension reform if number of workers in the firm drops below 500? Answers to these questions will help ensure we're setting up a proper analysis. Let's for now assume eligibility for the pension reform benefit does indeed change if hours worked or number of firm workers change. Your research question suggests eligibility does indeed change if these variables change.

    It looks like you have individual-level data, so I would first generate the following variables to help with easier analysis:
    • post = 1 if year => 2016, assuming 2016 is the first year of true treatment, 0 otherwise
    • treat = 1 if workers in firm => 500, 0 otherwise
    • pension = 1 if post = 1 & treat == 1, 0 otherwise
    • group_1 = 1 if average weekly hours under 20, 0 otherwise
    • group_2 = 1 if average weekly hours between 20-30, 0 otherwise
    • group_3 = 1 if average weekly hours over 30, 0 otherwise
    Now, a problem I see is that hours worked is both the outcome variable and a determination for eligibility. But I assume the survey data is not longitudinal, as most survey data are not, so this shouldn't really be a problem. Unfortunately, you're not able to estimate changes on an individual level, meaning you're not really able to exploit the third difference I mentioned above. What you're really estimating is the change in the distribution of hours worked by firm.

    Next, you want to determine which groups you actually want to compare, i.e. what equations you want to estimate. It seems three questions you probably want to answer are:
    1. Does pension reform induce workers who worked less than 20 hours to work between 20-30 hours?
    2. Does pension reform induce workers who worked between 20-30 hours to work more or less hours?
    3. Does pension reform induce workers who worked more than 20 hours to work between 20-30 hours
    These three questions ask about likely scenarios induced by the pension reform, at least in my opinion. These questions can be answered indirectly by asking the following:
    1. Does pension reform change the distribution of workers who work between 20-30 hours?
    2. Does pension reform change the distribution of workers who work less than 20 hours?
    3. Does pension reform change the distribution of workers who work more than 30 hours?
    I would run the following equations:
    Code:
    reg ln_hrs pension i.year i.firm married age age_sq, cluster(firm)
    reg group_1 pension i.year i.firm married age age_sq, cluster(firm)
    reg group_2 pension i.year i.firm married age age_sq, cluster(firm)
    reg group_3 pension i.year i.firm married age age_sq, cluster(firm)
    The coefficient for pension in the first equation will give you the average treatment effect (ATE) on hours worked at firms with over 500 workers after 2016, ie the ATE of the policy. Note that this is the average effect for all workers and average effect across all years. I would caution against giving this estimate too much attention, since the distribution of worker hours before 2016 will effect the estimate. The other equations, in my opinion, are more relevant. For example, the third equation will give you the change in the probability that a worker works between 20-30 hours. We would expect this effect to be positive, since workers in the other groups will likely change their hours to be classified in this group.

    You can then look at different subgroups by using the if condition, as you did in your original regression.

    Comment


    • #3
      Thanks for your kind response Al

      Originally posted by Al Perez View Post
      think I first want to ensure I have the correct understanding of treatment eligibility. Can you tell me a bit more about the pension reform?.
      The reform creates a labor cost on firms with 500+ workers. Before the reform, firms that had 20-30hr workers did not have to contribute to pension, after the reform they do.
      For those firms having workers who work above 30hrs, they would have to contribute to the pension regardless of the reform. Firms with under 20hr work weeks would not have to contribute regardless of the reform.

      Originally posted by Al Perez View Post
      Now, a problem I see is that hours worked is both the outcome variable and a determination for eligibility. But I assume the survey data is not longitudinal, as most survey data are not, so this shouldn't really be a problem. Unfortunately, you're not able to estimate changes on an individual level, meaning you're not really able to exploit the third difference I mentioned above. What you're really estimating is the change in the distribution of hours worked by firm.
      .
      In the below regressions, I dropped observations where workers are not working 20-30hrs in the below regressions. This sample will include those smaller firms that don't receive the reform effects only.
      This captures the treatment and control, but perhaps using a dummy and keeping the entire sample the correct way to be testing multiple control groups?

      As for the variables, I noticed my mistake about omitting the treated variable and inserted it as below:

      Code:
      // Treatment time (Reform occurs in Oct.2016)
      gen time = YYYYMM > 201609
      
      // Treated group: 500ppl+, Wkhrs 20-30, months after reform
      gen treated = 0
      replace treated = 1 if inrange(size, 7, 8) &inrange(WorkingTime_Week,20,30) & time == 1
      
      // Interaction effects
      gen did = treated*time
      Below is the estimate table for xtreg fixed effects by various age/gender groups. Also the date 637....686 are interaction effects for month, not year. The surveys are monthly.

      Code:
      . //Estimates table
      . estimates table male_young_fe male_prime_fe male_old_fe female_young_fe female_prime_fe female_old_fe, stats(N r2
      >  r2_a) star(.05 .01 .001)
      
      
      Variable  male_young_fe   male_prime_fe    male_old_fe    female_youn~e   female_prim~e   female_old_fe  
      
      did  -.06335122        .0086689      -.00799071       .06138662*      .00084008        .0174197    
      time   .22508194*     -.00542874      -.04402266      -.01567902       .03123864*     -.04721227    
      treated   (omitted)       (omitted)       (omitted)       (omitted)       (omitted)       (omitted)    
      married   (omitted)       (omitted)       (omitted)       (omitted)       (omitted)       (omitted)    
      age  -.03127932        .0391765**     .08770109      -.03387792       .01774397***   -.00955728    
      age_sq   .00002882       -.0004116**    -.00067633       .00078641      -.00023503***    .00014602    
      
      date
      637    .01483028       .01949263       .00399551      -.00099859       .00238829      -.00510992    
      638    .03403875       .02771634      -.01151852      -.02011524       .00625102      -.00833993    
      639   -.02580148       .03007762      -.02549073      -.03097651      -.01146925      -.02309226    
      640   -.01431036        .0246344      -.02359891       -.0041245       .01384798*       .0111537    
      641   -.02572707       .02472665      -.03102807      -.02158605       .01634536*      .01429675    
      642    .00954388       .02686355      -.04575085      -.02239645       .02272778***    .00577666    
      643   -.01891511       .02335591      -.04093955      -.01298043       .01239421      -.01540079    
      644    .03472182       .02685713      -.00586405       .02374297       .01478453*      -.0022926    
      645    .11040312*      .03369939      -.02682956       .02934718       .01539201*     -.00962696    
      646    .06443077       .03134361      -.03855582       .02067497        .0144461*     -.01199479    
      647    .04957055      -.00230938      -.03353541      -.02115902      -.01849672**    -.05159455***  
      648     .0334699       .00984061      -.01664924        -.014547       .00332664      -.02364399    
      649    .02876649       .02880876      -.01013813       .00094939       .01673821**    -.01237988    
      650    .05032143       .01761055      -.00742823      -.00388799       .01863912**    -.01618815    
      651    .03615941      -.02392381      -.03364384      -.07533966      -.01128259      -.02731352    
      652    .06455091       .02074589      -.01933612      -.03304385       .01537385*     -.00666463    
      653    .08107249       .03256231       -.0214189      -.03234303       .02378094***   -.00409894    
      654    .08176343       .03175518      -.04180578      -.02246648       .02109646**    -.01811314    
      655    .08597821       .03399713      -.04521471      -.01300255       .02500304***    -.0139565    
      656     .0878044       .03562141      -.03353633      -.01867719       .02203472**    -.01601315    
      657    .13895938*      .02471879      -.03275726      -.00687914       .01776976*     -.01632118    
      658     .0627716       .01426089      -.05238772       -.0273544      -.00289779      -.04218746*    
      659     .0720458      -.01229376      -.02587505      -.04661133       -.0081918      -.03821582*    
      660    .07567278      -.00247532      -.02885674      -.02220233       .01530281*     -.04039775*    
      661    .08353853       .00330771      -.03527273      -.02366675       .01568109*      -.0318076    
      662    .07913422       .00745052      -.02141916      -.01237999       .02393561**    -.01905583    
      663    .03264014       -.0031191      -.04739153      -.02580379       .00547008      -.03570722    
      664    .10077923       .00960485      -.03280885       .02072874       .02581338**    -.00934814    
      665    .09135764       .01477348      -.04995088       .00737557       .03469085***   -.00970304    
      666    .10109859       .02068301      -.05522597       .04144747       .02596652**    -.02039472    
      667    .11747367       .01133101      -.05860103      -.01082672       .02596772**    -.02976636    
      668    .14102251*      .01153634      -.05987687      -.00711932       .03139359***   -.02939528    
      669     .1174958        .0316789      -.05691928      -.00686122       .02457669**     -.0294494    
      670    .14336493*      .01100707      -.05842497       .02989413       .03055206***   -.02543143    
      671    .08133769      -.01286802      -.05056241      -.00100604       .00227917      -.03429375    
      672    .04480609      -.01992689      -.04320097      -.00423017       .02339529*     -.02895825    
      673    .09959497      -.00715016      -.02696779       .02974202       .03003979**    -.03083052    
      674    .10328407       -.0187076      -.01133239      -.00770488         .031975**     -.0342153    
      675    .11322101      -.00430784      -.05301695      -.02928651       .00619044      -.04486488    
      676    .14966392      -.00119473      -.04202775      -.02660858       .04003175***   -.00783868    
      677    .14549956       .01033178      -.03824544      -.01927022       .03000595**    -.01424443    
      678    .17106471*       .0224899      -.03253182      -.02739416       .03164351**    -.02192722    
      679    .22612895**      .0193148      -.04788885       .00801814       .02923926**    -.04351454    
      680     .1980548*      .01735297      -.08805466*      .00724638       .03015235**    -.04965386    
      681   -.01252625       .02004082      -.03593336       .02503789      -.00144887       .00640349    
      682   -.06281346      -.00212646       -.0333872       .03531413       .00455308       .00929459    
      683   -.06547609      -.01283606      -.05103184*      .01046673       -.0240731***   -.00790723    
      684   -.03396279       .00510708      -.01123893      -.03272397         .007411       .00930132    
      685   -.07441255*      .01005145       .00659603       -.0405325       .00346545      -.00120751    
      686    (omitted)       (omitted)       (omitted)       (omitted)       (omitted)       (omitted)    
      
      _cons   3.8244254**     2.3634916***    .43319741       3.5999509**     2.8816312***    3.2514334    
      
      N        3934           30857           11857            5781          141823           22393    
      r2   .06413887       .01436512       .01592592       .03837894       .00911273       .00909293    
      r2_a    -2.61566      -3.1960196      -2.0108857      -2.4957042      -1.2594275      -1.3467362    
      
      legend: * p<.05; ** p<.01; *** p<.001
      Originally posted by Al Perez View Post
      I would run the following equations:

      reg ln_hrs pension i.year i.firm married age age_sq, cluster(firm)
      reg group_1 pension i.year i.firm married age age_sq, cluster(firm)
      reg group_2 pension i.year i.firm married age age_sq, cluster(firm)
      reg group_3 pension i.year i.firm married age age_sq, cluster(firm)

      The coefficient for pension in the first equation will give you the average treatment effect (ATE) on hours worked at firms with over 500 workers after 2016, ie the ATE of the policy. Note that this is the average effect for all workers and average effect across all years. I would caution against giving this estimate too much attention, since the distribution of worker hours before 2016 will effect the estimate. The other equations, in my opinion, are more relevant. For example, the third equation will give you the change in the probability that a worker works between 20-30 hours. We would expect this effect to be positive, since workers in the other groups will likely change their hours to be classified in this group.

      You can then look at different subgroups by using the if condition, as you did in your original regression.
      I am not sure I understand what you have done. It seems you are proposing instead of changing the sample size each time to test a new control, that I instead make groups for wrk hrs, and set the group as the outcome variable.
      So in this case, I see the effect of reform on inducing individuals to change to a different work group, rather than an incremental change in work hours itself?

      Thank you
      Last edited by Nick Llywd; 01 Nov 2020, 19:08.

      Comment


      • #4
        One more thing I forgot to mention,
        But I assume the survey data is not longitudinal, as most survey data are not, so this shouldn't really be a problem. Unfortunately, you're not able to estimate changes on an individual level, meaning you're not really able to exploit the third difference I mentioned above. What you're really estimating is the change in the distribution of hours worked by firm.
        The data are longitudinal, with up to 8 observations per individual, it is an unbalanced panel data set.
        Last edited by Nick Llywd; 01 Nov 2020, 21:01.

        Comment

        Working...
        X