Should I include time dummies in my random effects regression?

John Adler

Join Date: Apr 2017
Posts: 173

Should I include time dummies in my random effects regression?

31 Jul 2018, 12:02

I think I have a strong theoretical argument for not including time-dummies in my random effects regression, I would be interested in the opinion of users here and suggestions as to how to strengthen this argument, particularly if there are any quantitative methods that I could apply in Stata.

I have panel data of local area unemployment and health outcomes in the same mothers analysed at 3 Waves, each five years apart, before, during and after a recession.

The results of my initial regression are as follows:

Code:


. * LPM:
. 
. xtreg binbmi_obese_y psum_unemployed_total_cont_y i.own_education_y i.maritalstatus_y i.medical_card_y i.employment_y i.ord_age_y if has_y0_questionnaire==1 &  has_y5_
> questionnaire==1 | has_y0_questionnaire==1 & has_y10_questionnaire==1 | has_y0_questionnaire==1 & has_y5_questionnaire==1 & has_y10_questionnaire==1, cluster (current_
> county_y1) re robust

Random-effects GLS regression                   Number of obs     =      1,133
Group variable: id                              Number of groups  =        556

R-sq:                                           Obs per group:
     within  = 0.0750                                         min =          1
     between = 0.0147                                         avg =        2.0
     overall = 0.0302                                         max =          3

                                                Wald chi2(22)     =  218331.51
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

                                                                    (Std. Err. adjusted for 28 clusters in current_county_y1)
-----------------------------------------------------------------------------------------------------------------------------
                                                            |               Robust
                                             binbmi_obese_y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
------------------------------------------------------------+----------------------------------------------------------------
                               psum_unemployed_total_cont_y |    .005794   .0014117     4.10   0.000     .0030272    .0085608
                                                            |
                                            own_education_y |
                                              No schooling  |          0  (empty)
                                  Primary school education  |  -.1315012   .0942464    -1.40   0.163    -.3162206    .0532183
                                     Some secondary school  |   .0736315   .1101785     0.67   0.504    -.1423144    .2895775
                              Complete secondary education  |   .0279008   .0882128     0.32   0.752    -.1449931    .2007947
    Some third level education at college, university, RTC  |   .0842196    .098333     0.86   0.392    -.1085096    .2769488
Complete third level education at college, university, RTC  |  -.0220746   .0883634    -0.25   0.803    -.1952636    .1511145
                                                            |
                                            maritalstatus_y |
                                                Cohabiting  |  -.0837401   .0382859    -2.19   0.029    -.1587792   -.0087011
                                                 Separated  |   .0225485   .0605217     0.37   0.709    -.0960717    .1411688
                                                  Divorced  |    .084211   .1269417     0.66   0.507    -.1645901    .3330121
                                                   Widowed  |  -.0079601   .1239793    -0.06   0.949     -.250955    .2350348
                                      Single/Never married  |  -.0970986   .0382337    -2.54   0.011    -.1720353    -.022162
                                                            |
                                             medical_card_y |
                                                       Yes  |   .0147679   .0384133     0.38   0.701    -.0605207    .0900565
                                                            |
                                               employment_y |
                                                Unemployed  |   .0231915   .0593355     0.39   0.696     -.093104    .1394869
  Unable to work owing to permanent sickness or disability  |   .2963391   .1077156     2.75   0.006     .0852204    .5074577
                                         At school/student  |  -.0237847   .0565382    -0.42   0.674    -.1345975    .0870282
                           Seeking work for the first time  |  -.1044752   .0674654    -1.55   0.121    -.2367049    .0277545
                                                  Employed  |  -.0413736   .0116618    -3.55   0.000    -.0642303   -.0185169
                                             Self Employed  |  -.0094837   .0218855    -0.43   0.665    -.0523785    .0334111
                                                            |
                                                  ord_age_y |
                                                     20-23  |   .1274583   .0904683     1.41   0.159    -.0498563     .304773
                                                     24-27  |   .1046117   .0683596     1.53   0.126    -.0293708    .2385941
                                                     28-32  |   .1036983   .0691316     1.50   0.134    -.0317971    .2391937
                                                      33 +  |    .084037   .0811597     1.04   0.300    -.0750332    .2431072
                                                            |
                                                      _cons |          0  (omitted)
------------------------------------------------------------+----------------------------------------------------------------
                                                    sigma_u |  .26123467
                                                    sigma_e |  .21894127
                                                        rho |   .5874009   (fraction of variance due to u_i)
-----------------------------------------------------------------------------------------------------------------------------

The data is clustered at the respondents local area level (i.e. County, which are similar to American States), there are 30 clusters and the effect of local area unemployment (psum_unemployed_total_cont_y) on health is measured at this same local area level to account for endogeneity in the relationship between unemployment and health (i.e. is the same person who is likely to be unemployed likely to be unhealthy for some unobserved reason?)

A colleague suggested I make use of year dummies, as this is often done in panel data, the below is what this looks like when I add the year variable:

Code:

. xtreg binbmi_obese_y psum_unemployed_total_cont_y i.own_education_y i.maritalstatus_y i.medical_card_y i.employment_y i.ord_age_y i.year if has_y0_questionnaire==1 &  
> has_y5_questionnaire==1 | has_y0_questionnaire==1 & has_y10_questionnaire==1 | has_y0_questionnaire==1 & has_y5_questionnaire==1 & has_y10_questionnaire==1, cluster (c
> urrent_county_y1) re robust

Random-effects GLS regression                   Number of obs     =      1,133
Group variable: id                              Number of groups  =        556

R-sq:                                           Obs per group:
     within  = 0.0862                                         min =          1
     between = 0.0249                                         avg =        2.0
     overall = 0.0427                                         max =          3

                                                Wald chi2(23)     =          .
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =          .

                                                                    (Std. Err. adjusted for 28 clusters in current_county_y1)
-----------------------------------------------------------------------------------------------------------------------------
                                                            |               Robust
                                             binbmi_obese_y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
------------------------------------------------------------+----------------------------------------------------------------
                               psum_unemployed_total_cont_y |  -.0023701   .0052052    -0.46   0.649    -.0125721    .0078319
                                                            |
                                            own_education_y |
                                              No schooling  |          0  (empty)
                                  Primary school education  |          0  (omitted)
                                     Some secondary school  |   .1505257   .0497344     3.03   0.002     .0530481    .2480033
                              Complete secondary education  |   .1029592   .0175976     5.85   0.000     .0684685    .1374499
    Some third level education at college, university, RTC  |   .1587382   .0249566     6.36   0.000     .1098241    .2076523
Complete third level education at college, university, RTC  |   .0640187   .0212939     3.01   0.003     .0222834     .105754
                                                            |
                                            maritalstatus_y |
                                                Cohabiting  |  -.0846086   .0379878    -2.23   0.026    -.1590634   -.0101538
                                                 Separated  |  -.0233706   .0737117    -0.32   0.751    -.1678429    .1211016
                                                  Divorced  |   .0769838   .1250147     0.62   0.538    -.1680406    .3220082
                                                   Widowed  |   .0261904   .1288942     0.20   0.839    -.2264376    .2788183
                                      Single/Never married  |  -.0912056   .0396385    -2.30   0.021    -.1688957   -.0135155
                                                            |
                                             medical_card_y |
                                                       Yes  |   .0034374   .0381739     0.09   0.928    -.0713821    .0782569
                                                            |
                                               employment_y |
                                                Unemployed  |   .0245004   .0608822     0.40   0.687    -.0948265    .1438273
  Unable to work owing to permanent sickness or disability  |    .287834   .1075161     2.68   0.007     .0771064    .4985616
                                         At school/student  |  -.0190166   .0589808    -0.32   0.747    -.1346169    .0965838
                           Seeking work for the first time  |  -.1182621   .0651866    -1.81   0.070    -.2460254    .0095012
                                                  Employed  |  -.0297897   .0100574    -2.96   0.003    -.0495018   -.0100776
                                             Self Employed  |  -.0072221   .0215375    -0.34   0.737    -.0494349    .0349906
                                                            |
                                                  ord_age_y |
                                                     20-23  |   .1020646   .0893248     1.14   0.253    -.0730088     .277138
                                                     24-27  |   .0637783   .0701205     0.91   0.363    -.0736554     .201212
                                                     28-32  |   .0529197   .0681972     0.78   0.438    -.0807443    .1865838
                                                      33 +  |  -.0025392   .0789943    -0.03   0.974    -.1573651    .1522868
                                                            |
                                                       year |
                                                         5  |   .0637809   .0157891     4.04   0.000     .0328348     .094727
                                                        10  |   .1349336   .0595785     2.26   0.024     .0181618    .2517053
                                                            |
                                                      _cons |   .0148301   .0725344     0.20   0.838    -.1273346    .1569949
------------------------------------------------------------+----------------------------------------------------------------
                                                    sigma_u |  .25911118
                                                    sigma_e |  .21775861
                                                        rho |  .58606947   (fraction of variance due to u_i)
-----------------------------------------------------------------------------------------------------------------------------

. testparm i.year

 ( 1)  5.year = 0
 ( 2)  10.year = 0

           chi2(  2) =   16.54
         Prob > chi2 =    0.0003

Code:

. xtreg binbmi_obese_y psum_unemployed_total_cont_y i.own_education_y i.maritalstatus_y i.medical_card_y i.employment_y i.ord_age_y year if has_y0_questionnaire==1 &  ha
> s_y5_questionnaire==1 | has_y0_questionnaire==1 & has_y10_questionnaire==1 | has_y0_questionnaire==1 & has_y5_questionnaire==1 & has_y10_questionnaire==1, cluster (cur
> rent_county_y1) re robust

Random-effects GLS regression                   Number of obs     =      1,133
Group variable: id                              Number of groups  =        556

R-sq:                                           Obs per group:
     within  = 0.0864                                         min =          1
     between = 0.0247                                         avg =        2.0
     overall = 0.0425                                         max =          3

                                                Wald chi2(23)     =  470684.48
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

                                                                    (Std. Err. adjusted for 28 clusters in current_county_y1)
-----------------------------------------------------------------------------------------------------------------------------
                                                            |               Robust
                                             binbmi_obese_y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
------------------------------------------------------------+----------------------------------------------------------------
                               psum_unemployed_total_cont_y |   -.001825   .0021845    -0.84   0.403    -.0061066    .0024565
                                                            |
                                            own_education_y |
                                              No schooling  |          0  (empty)
                                  Primary school education  |   .0110054   .0830468     0.13   0.895    -.1517634    .1737742
                                     Some secondary school  |   .1617676   .0901997     1.79   0.073    -.0150206    .3385557
                              Complete secondary education  |   .1141553    .076832     1.49   0.137    -.0364328    .2647433
    Some third level education at college, university, RTC  |   .1698459   .0919531     1.85   0.065     -.010379    .3500707
Complete third level education at college, university, RTC  |   .0748105   .0766501     0.98   0.329    -.0754209    .2250419
                                                            |
                                            maritalstatus_y |
                                                Cohabiting  |  -.0847227   .0376777    -2.25   0.025    -.1585697   -.0108757
                                                 Separated  |  -.0233631    .073862    -0.32   0.752    -.1681298    .1214037
                                                  Divorced  |   .0767167   .1251534     0.61   0.540    -.1685795    .3220129
                                                   Widowed  |   .0251002   .1299681     0.19   0.847    -.2296326     .279833
                                      Single/Never married  |  -.0914132   .0394799    -2.32   0.021    -.1687925    -.014034
                                                            |
                                             medical_card_y |
                                                       Yes  |   .0034453   .0383149     0.09   0.928    -.0716506    .0785412
                                                            |
                                               employment_y |
                                                Unemployed  |   .0248206   .0607427     0.41   0.683    -.0942329    .1438742
  Unable to work owing to permanent sickness or disability  |   .2890932   .1035665     2.79   0.005     .0861067    .4920798
                                         At school/student  |  -.0190324    .059003    -0.32   0.747    -.1346761    .0966113
                           Seeking work for the first time  |  -.1181528   .0649302    -1.82   0.069    -.2454136     .009108
                                                  Employed  |  -.0295514   .0107356    -2.75   0.006    -.0505927     -.00851
                                             Self Employed  |  -.0071762   .0215688    -0.33   0.739    -.0494503    .0350978
                                                            |
                                                  ord_age_y |
                                                     20-23  |   .1015066   .0880151     1.15   0.249    -.0709999    .2740131
                                                     24-27  |   .0630871   .0679689     0.93   0.353    -.0701296    .1963037
                                                     28-32  |   .0521749   .0656988     0.79   0.427    -.0765924    .1809421
                                                      33 +  |  -.0034875   .0750595    -0.05   0.963    -.1506013    .1436264
                                                            |
                                                       year |   .0129658   .0033547     3.87   0.000     .0063908    .0195409
                                                      _cons |          0  (omitted)
------------------------------------------------------------+----------------------------------------------------------------
                                                    sigma_u |  .25910804
                                                    sigma_e |  .21763809
                                                        rho |  .58633216   (fraction of variance due to u_i)
-----------------------------------------------------------------------------------------------------------------------------

. testparm year

 ( 1)  year = 0

           chi2(  1) =   14.94
         Prob > chi2 =    0.0001

.

In each case I used testparm to test the significance of the years.

However I wasn't too happy with this approach, to begin, I was surprised to see Prob > chi2 = . when including year as i.year. I'm not quite sure what this means and would be open to interpretation.

More importantly I think there is a good theoretical basis for not including year in this analysis, as follows:

Here I investigate the impact of unemployment on health during the Great Recession. Impacts are an effect of county level unemployment but are also made up of the overall impact of the recession experienced at a country level, when I hold year fixed, I feel I am leaving out the national level trend of unemployment and because of this, I ignore the importance of the recession as an employment effect.

Basically, I feel that what I am seeing is a combination of the national variation in the unemployment rate and the local area variation. I feel that if I were to add year dummies, that I might not get much variation, as there can’t be that much to identify the employment effect, just from the local effect. Put another way, if I am framing this paper as a recessionary analysis (i.e. the effect of the great recession on health as mediated by unemployment) then there wouldn’t be that much to identify the effect of the recession on health by just looking at local area unemployment and holding the effects of national unemployment fixed.

I thought the following might be a good explanation of this to place in text:

In random effects models time can be included in the fixed part as discrete time dummies in order to to take into account effects that may influence all cases in a given year to the same amount. Here, the intention is to remove a potential cause of spuriousness that results from common trends in observed variables. Including time dummies in this model however may overfit it. Put simply, the eliminated trend is the national level employment trend, i.e. the effect of the great recession. In other words, this analysis considers the effects of unemployment on health during the great recession. Impacts are an effect of county level unemployment but are also made up of the overall impact of the recession experienced at a country level. When I hold year fixed, I feel I am leaving out the national level trend of unemployment and because of this, I ignore the importance of the recession as an employment effect.

To control for time-specific effects expected to affect the whole sample over time, these are included as controls in the random effects model. These controls include age, marital status, state-support recipient and own-employment, etc., Discrete time dummies are not included, i.e. the effect of time itself is not modeled, because the recession is made up of local and national level employment effects. By controlling for year, national level employment effects are held fixed and thus only local level effects on health may be examined. In other words, by holding year as constant, it means that I ignore year effects in my analysis, the problem with this approach is that there was a recession, and that the year effects would include this recession.

In the case of small samples, such as this, there is the related problems that this will use up some degrees of freedom, which has a direct effect on the precision of the parameter estimates. Thus, estimates may be unbiased but completely unreliable. A model which is too complex, or overspecified, may reduce the precision of coefficient estimates and predicted values. The implications of both bias and precision for the analysis were thus considered when making this analysis decision. In an exploratory analysis where time dummies were included, the significance of the later years in this model. i.e. right when the financial crisis struck, would support this. Results are not robust to the inclusion of years, however, as years becomes significant I think this supports the argument for a national trend effect.

In text explanation ends.

My question is, would this be a reasonable argument? Or can I expect to face concerns over not including year in my analysis? Is there a firm quantitative argument that I can make against the inclusion of year? My concern is in dealing with reviewer queries when sending this article for publication.

I did notice that the in the first model -sigma_u- outperforms -sigma_e- such that a higher portion of the variation in -depvar- is explained by individual effect rather than idiosyncratic error, but I'm not sure if this is something worth mentioning.

I don't know if an argument could be made that the included years don't add anything informative to my results; and hence should not be plugged in among the predictors, i.e. I don't know if my data show any evidence that year has a statistical significant effect on my depvar. Even if there is a statistically significant effect, I would assume that this only supports the argument that I make above, i.e. that there was a recession at this time and that this was effecting health, and that by holding this fixed I can no longer look at the effect of this recession on health?

Although my number of observations may appear large, my actual sample is only 614 mothers, so I don’t know if my argument on the dangers of over-fitting the model above will hold much water.

Tags: fixed effects, panel data, random effects, regression, syntax

Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#2

01 Aug 2018, 11:40

You didn't get a quick response. You'll increase your chances of a useful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using dataex. Also, this may be the longest posting I have seen on the forum. It is not likely many will bother to read so much just to help you. Try to condense your question down to the minimum needed to explain the problem.

If there is regional variation in unemployment (as there must be), then it should show up as influencing health even holding national unemployment rate constant. I'd also suspect there might be other trends in health care that time dummies control for.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17748
#3

01 Aug 2018, 13:56

John:
I do agree with Phil: your post is really difficult to follow.
That said:
- if -parmtest- is statistically significant, it's probably reasonable to use -i.year- as a predictor;
-the missing chi2 is probably due to panels with singleton;
- as usual, the model specification should give a fair and true view of the data generating process.

Kind regards,
Carlo
(Stata 19.0)
Comment
John Adler

Join Date: Apr 2017

Posts: 173
#4

01 Aug 2018, 15:37

Dear Carlo and Phil,

Thank you for your responses and apologies for being obtuse.

My main question here is, could an argument be made that by including years I am over-fitting my model? I am especially concerned as my sample is so small (614 mothers).

Relatedly, I consider my analysis to be on the effects of the Irish recession on health as mediated by unemployment. The recession is made up of local and national level unemployment. I feel that by including years in the model I am holding years (and thus national unemployment rates) constant, thus I feel that I am no longer considering the effect of the recession on health, but just the effect of local area unemployment, which is only part of the story that I would like to tell.

I was wondering if this was a good theoretical argument for not including dummies for years.

Kindest regards,

John
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17748
#5

02 Aug 2018, 12:04

John:
far from me any idea of obtuseness: you simply wrote a too long post: that's all.
Sticking to your issue:
- by including -year- you're assuming that time can contribute to expalin the variation in the dependent variable. I would ski through the literature in your research field and see what others did in the past about this issue;
- I would also check the model specification (by the way: this kind of model often show quadratic terms) because your between R-sq are pretty low and most of your predictors are not significant (maybe you have too many categorical variables);
- eventually, I would also check whether a risk of inversal causation exists: it makes sense that bad health can explain being unemployed, but can we rule out that being unemployed has nothing to do with bad health (say, usually worse-off people buy cheap but junk food)?

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

Should I include time dummies in my random effects regression?

Comment

Comment

Comment

Comment