Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Should I include time dummies in my random effects regression?

    I think I have a strong theoretical argument for not including time-dummies in my random effects regression, I would be interested in the opinion of users here and suggestions as to how to strengthen this argument, particularly if there are any quantitative methods that I could apply in Stata.

    I have panel data of local area unemployment and health outcomes in the same mothers analysed at 3 Waves, each five years apart, before, during and after a recession.

    The results of my initial regression are as follows:

    Code:
    
    . * LPM:
    . 
    . xtreg binbmi_obese_y psum_unemployed_total_cont_y i.own_education_y i.maritalstatus_y i.medical_card_y i.employment_y i.ord_age_y if has_y0_questionnaire==1 &  has_y5_
    > questionnaire==1 | has_y0_questionnaire==1 & has_y10_questionnaire==1 | has_y0_questionnaire==1 & has_y5_questionnaire==1 & has_y10_questionnaire==1, cluster (current_
    > county_y1) re robust
    
    Random-effects GLS regression                   Number of obs     =      1,133
    Group variable: id                              Number of groups  =        556
    
    R-sq:                                           Obs per group:
         within  = 0.0750                                         min =          1
         between = 0.0147                                         avg =        2.0
         overall = 0.0302                                         max =          3
    
                                                    Wald chi2(22)     =  218331.51
    corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
    
                                                                        (Std. Err. adjusted for 28 clusters in current_county_y1)
    -----------------------------------------------------------------------------------------------------------------------------
                                                                |               Robust
                                                 binbmi_obese_y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ------------------------------------------------------------+----------------------------------------------------------------
                                   psum_unemployed_total_cont_y |    .005794   .0014117     4.10   0.000     .0030272    .0085608
                                                                |
                                                own_education_y |
                                                  No schooling  |          0  (empty)
                                      Primary school education  |  -.1315012   .0942464    -1.40   0.163    -.3162206    .0532183
                                         Some secondary school  |   .0736315   .1101785     0.67   0.504    -.1423144    .2895775
                                  Complete secondary education  |   .0279008   .0882128     0.32   0.752    -.1449931    .2007947
        Some third level education at college, university, RTC  |   .0842196    .098333     0.86   0.392    -.1085096    .2769488
    Complete third level education at college, university, RTC  |  -.0220746   .0883634    -0.25   0.803    -.1952636    .1511145
                                                                |
                                                maritalstatus_y |
                                                    Cohabiting  |  -.0837401   .0382859    -2.19   0.029    -.1587792   -.0087011
                                                     Separated  |   .0225485   .0605217     0.37   0.709    -.0960717    .1411688
                                                      Divorced  |    .084211   .1269417     0.66   0.507    -.1645901    .3330121
                                                       Widowed  |  -.0079601   .1239793    -0.06   0.949     -.250955    .2350348
                                          Single/Never married  |  -.0970986   .0382337    -2.54   0.011    -.1720353    -.022162
                                                                |
                                                 medical_card_y |
                                                           Yes  |   .0147679   .0384133     0.38   0.701    -.0605207    .0900565
                                                                |
                                                   employment_y |
                                                    Unemployed  |   .0231915   .0593355     0.39   0.696     -.093104    .1394869
      Unable to work owing to permanent sickness or disability  |   .2963391   .1077156     2.75   0.006     .0852204    .5074577
                                             At school/student  |  -.0237847   .0565382    -0.42   0.674    -.1345975    .0870282
                               Seeking work for the first time  |  -.1044752   .0674654    -1.55   0.121    -.2367049    .0277545
                                                      Employed  |  -.0413736   .0116618    -3.55   0.000    -.0642303   -.0185169
                                                 Self Employed  |  -.0094837   .0218855    -0.43   0.665    -.0523785    .0334111
                                                                |
                                                      ord_age_y |
                                                         20-23  |   .1274583   .0904683     1.41   0.159    -.0498563     .304773
                                                         24-27  |   .1046117   .0683596     1.53   0.126    -.0293708    .2385941
                                                         28-32  |   .1036983   .0691316     1.50   0.134    -.0317971    .2391937
                                                          33 +  |    .084037   .0811597     1.04   0.300    -.0750332    .2431072
                                                                |
                                                          _cons |          0  (omitted)
    ------------------------------------------------------------+----------------------------------------------------------------
                                                        sigma_u |  .26123467
                                                        sigma_e |  .21894127
                                                            rho |   .5874009   (fraction of variance due to u_i)
    -----------------------------------------------------------------------------------------------------------------------------
    The data is clustered at the respondents local area level (i.e. County, which are similar to American States), there are 30 clusters and the effect of local area unemployment (psum_unemployed_total_cont_y) on health is measured at this same local area level to account for endogeneity in the relationship between unemployment and health (i.e. is the same person who is likely to be unemployed likely to be unhealthy for some unobserved reason?)

    A colleague suggested I make use of year dummies, as this is often done in panel data, the below is what this looks like when I add the year variable:

    Code:
    . xtreg binbmi_obese_y psum_unemployed_total_cont_y i.own_education_y i.maritalstatus_y i.medical_card_y i.employment_y i.ord_age_y i.year if has_y0_questionnaire==1 &  
    > has_y5_questionnaire==1 | has_y0_questionnaire==1 & has_y10_questionnaire==1 | has_y0_questionnaire==1 & has_y5_questionnaire==1 & has_y10_questionnaire==1, cluster (c
    > urrent_county_y1) re robust
    
    Random-effects GLS regression                   Number of obs     =      1,133
    Group variable: id                              Number of groups  =        556
    
    R-sq:                                           Obs per group:
         within  = 0.0862                                         min =          1
         between = 0.0249                                         avg =        2.0
         overall = 0.0427                                         max =          3
    
                                                    Wald chi2(23)     =          .
    corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =          .
    
                                                                        (Std. Err. adjusted for 28 clusters in current_county_y1)
    -----------------------------------------------------------------------------------------------------------------------------
                                                                |               Robust
                                                 binbmi_obese_y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ------------------------------------------------------------+----------------------------------------------------------------
                                   psum_unemployed_total_cont_y |  -.0023701   .0052052    -0.46   0.649    -.0125721    .0078319
                                                                |
                                                own_education_y |
                                                  No schooling  |          0  (empty)
                                      Primary school education  |          0  (omitted)
                                         Some secondary school  |   .1505257   .0497344     3.03   0.002     .0530481    .2480033
                                  Complete secondary education  |   .1029592   .0175976     5.85   0.000     .0684685    .1374499
        Some third level education at college, university, RTC  |   .1587382   .0249566     6.36   0.000     .1098241    .2076523
    Complete third level education at college, university, RTC  |   .0640187   .0212939     3.01   0.003     .0222834     .105754
                                                                |
                                                maritalstatus_y |
                                                    Cohabiting  |  -.0846086   .0379878    -2.23   0.026    -.1590634   -.0101538
                                                     Separated  |  -.0233706   .0737117    -0.32   0.751    -.1678429    .1211016
                                                      Divorced  |   .0769838   .1250147     0.62   0.538    -.1680406    .3220082
                                                       Widowed  |   .0261904   .1288942     0.20   0.839    -.2264376    .2788183
                                          Single/Never married  |  -.0912056   .0396385    -2.30   0.021    -.1688957   -.0135155
                                                                |
                                                 medical_card_y |
                                                           Yes  |   .0034374   .0381739     0.09   0.928    -.0713821    .0782569
                                                                |
                                                   employment_y |
                                                    Unemployed  |   .0245004   .0608822     0.40   0.687    -.0948265    .1438273
      Unable to work owing to permanent sickness or disability  |    .287834   .1075161     2.68   0.007     .0771064    .4985616
                                             At school/student  |  -.0190166   .0589808    -0.32   0.747    -.1346169    .0965838
                               Seeking work for the first time  |  -.1182621   .0651866    -1.81   0.070    -.2460254    .0095012
                                                      Employed  |  -.0297897   .0100574    -2.96   0.003    -.0495018   -.0100776
                                                 Self Employed  |  -.0072221   .0215375    -0.34   0.737    -.0494349    .0349906
                                                                |
                                                      ord_age_y |
                                                         20-23  |   .1020646   .0893248     1.14   0.253    -.0730088     .277138
                                                         24-27  |   .0637783   .0701205     0.91   0.363    -.0736554     .201212
                                                         28-32  |   .0529197   .0681972     0.78   0.438    -.0807443    .1865838
                                                          33 +  |  -.0025392   .0789943    -0.03   0.974    -.1573651    .1522868
                                                                |
                                                           year |
                                                             5  |   .0637809   .0157891     4.04   0.000     .0328348     .094727
                                                            10  |   .1349336   .0595785     2.26   0.024     .0181618    .2517053
                                                                |
                                                          _cons |   .0148301   .0725344     0.20   0.838    -.1273346    .1569949
    ------------------------------------------------------------+----------------------------------------------------------------
                                                        sigma_u |  .25911118
                                                        sigma_e |  .21775861
                                                            rho |  .58606947   (fraction of variance due to u_i)
    -----------------------------------------------------------------------------------------------------------------------------
    
    . testparm i.year
    
     ( 1)  5.year = 0
     ( 2)  10.year = 0
    
               chi2(  2) =   16.54
             Prob > chi2 =    0.0003
    Code:
    . xtreg binbmi_obese_y psum_unemployed_total_cont_y i.own_education_y i.maritalstatus_y i.medical_card_y i.employment_y i.ord_age_y year if has_y0_questionnaire==1 &  ha
    > s_y5_questionnaire==1 | has_y0_questionnaire==1 & has_y10_questionnaire==1 | has_y0_questionnaire==1 & has_y5_questionnaire==1 & has_y10_questionnaire==1, cluster (cur
    > rent_county_y1) re robust
    
    Random-effects GLS regression                   Number of obs     =      1,133
    Group variable: id                              Number of groups  =        556
    
    R-sq:                                           Obs per group:
         within  = 0.0864                                         min =          1
         between = 0.0247                                         avg =        2.0
         overall = 0.0425                                         max =          3
    
                                                    Wald chi2(23)     =  470684.48
    corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
    
                                                                        (Std. Err. adjusted for 28 clusters in current_county_y1)
    -----------------------------------------------------------------------------------------------------------------------------
                                                                |               Robust
                                                 binbmi_obese_y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ------------------------------------------------------------+----------------------------------------------------------------
                                   psum_unemployed_total_cont_y |   -.001825   .0021845    -0.84   0.403    -.0061066    .0024565
                                                                |
                                                own_education_y |
                                                  No schooling  |          0  (empty)
                                      Primary school education  |   .0110054   .0830468     0.13   0.895    -.1517634    .1737742
                                         Some secondary school  |   .1617676   .0901997     1.79   0.073    -.0150206    .3385557
                                  Complete secondary education  |   .1141553    .076832     1.49   0.137    -.0364328    .2647433
        Some third level education at college, university, RTC  |   .1698459   .0919531     1.85   0.065     -.010379    .3500707
    Complete third level education at college, university, RTC  |   .0748105   .0766501     0.98   0.329    -.0754209    .2250419
                                                                |
                                                maritalstatus_y |
                                                    Cohabiting  |  -.0847227   .0376777    -2.25   0.025    -.1585697   -.0108757
                                                     Separated  |  -.0233631    .073862    -0.32   0.752    -.1681298    .1214037
                                                      Divorced  |   .0767167   .1251534     0.61   0.540    -.1685795    .3220129
                                                       Widowed  |   .0251002   .1299681     0.19   0.847    -.2296326     .279833
                                          Single/Never married  |  -.0914132   .0394799    -2.32   0.021    -.1687925    -.014034
                                                                |
                                                 medical_card_y |
                                                           Yes  |   .0034453   .0383149     0.09   0.928    -.0716506    .0785412
                                                                |
                                                   employment_y |
                                                    Unemployed  |   .0248206   .0607427     0.41   0.683    -.0942329    .1438742
      Unable to work owing to permanent sickness or disability  |   .2890932   .1035665     2.79   0.005     .0861067    .4920798
                                             At school/student  |  -.0190324    .059003    -0.32   0.747    -.1346761    .0966113
                               Seeking work for the first time  |  -.1181528   .0649302    -1.82   0.069    -.2454136     .009108
                                                      Employed  |  -.0295514   .0107356    -2.75   0.006    -.0505927     -.00851
                                                 Self Employed  |  -.0071762   .0215688    -0.33   0.739    -.0494503    .0350978
                                                                |
                                                      ord_age_y |
                                                         20-23  |   .1015066   .0880151     1.15   0.249    -.0709999    .2740131
                                                         24-27  |   .0630871   .0679689     0.93   0.353    -.0701296    .1963037
                                                         28-32  |   .0521749   .0656988     0.79   0.427    -.0765924    .1809421
                                                          33 +  |  -.0034875   .0750595    -0.05   0.963    -.1506013    .1436264
                                                                |
                                                           year |   .0129658   .0033547     3.87   0.000     .0063908    .0195409
                                                          _cons |          0  (omitted)
    ------------------------------------------------------------+----------------------------------------------------------------
                                                        sigma_u |  .25910804
                                                        sigma_e |  .21763809
                                                            rho |  .58633216   (fraction of variance due to u_i)
    -----------------------------------------------------------------------------------------------------------------------------
    
    . testparm year
    
     ( 1)  year = 0
    
               chi2(  1) =   14.94
             Prob > chi2 =    0.0001
    
    .
    In each case I used testparm to test the significance of the years.


    However I wasn't too happy with this approach, to begin, I was surprised to see Prob > chi2 = . when including year as i.year. I'm not quite sure what this means and would be open to interpretation.


    More importantly I think there is a good theoretical basis for not including year in this analysis, as follows:


    Here I investigate the impact of unemployment on health during the Great Recession. Impacts are an effect of county level unemployment but are also made up of the overall impact of the recession experienced at a country level, when I hold year fixed, I feel I am leaving out the national level trend of unemployment and because of this, I ignore the importance of the recession as an employment effect.


    Basically, I feel that what I am seeing is a combination of the national variation in the unemployment rate and the local area variation. I feel that if I were to add year dummies, that I might not get much variation, as there can’t be that much to identify the employment effect, just from the local effect. Put another way, if I am framing this paper as a recessionary analysis (i.e. the effect of the great recession on health as mediated by unemployment) then there wouldn’t be that much to identify the effect of the recession on health by just looking at local area unemployment and holding the effects of national unemployment fixed.


    I thought the following might be a good explanation of this to place in text:


    In random effects models time can be included in the fixed part as discrete time dummies in order to to take into account effects that may influence all cases in a given year to the same amount. Here, the intention is to remove a potential cause of spuriousness that results from common trends in observed variables. Including time dummies in this model however may overfit it. Put simply, the eliminated trend is the national level employment trend, i.e. the effect of the great recession. In other words, this analysis considers the effects of unemployment on health during the great recession. Impacts are an effect of county level unemployment but are also made up of the overall impact of the recession experienced at a country level. When I hold year fixed, I feel I am leaving out the national level trend of unemployment and because of this, I ignore the importance of the recession as an employment effect.

    To control for time-specific effects expected to affect the whole sample over time, these are included as controls in the random effects model. These controls include age, marital status, state-support recipient and own-employment, etc., Discrete time dummies are not included, i.e. the effect of time itself is not modeled, because the recession is made up of local and national level employment effects. By controlling for year, national level employment effects are held fixed and thus only local level effects on health may be examined. In other words, by holding year as constant, it means that I ignore year effects in my analysis, the problem with this approach is that there was a recession, and that the year effects would include this recession.

    In the case of small samples, such as this, there is the related problems that this will use up some degrees of freedom, which has a direct effect on the precision of the parameter estimates. Thus, estimates may be unbiased but completely unreliable. A model which is too complex, or overspecified, may reduce the precision of coefficient estimates and predicted values. The implications of both bias and precision for the analysis were thus considered when making this analysis decision. In an exploratory analysis where time dummies were included, the significance of the later years in this model. i.e. right when the financial crisis struck, would support this. Results are not robust to the inclusion of years, however, as years becomes significant I think this supports the argument for a national trend effect.


    In text explanation ends.

    My question is, would this be a reasonable argument? Or can I expect to face concerns over not including year in my analysis? Is there a firm quantitative argument that I can make against the inclusion of year? My concern is in dealing with reviewer queries when sending this article for publication.

    I did notice that the in the first model -sigma_u- outperforms -sigma_e- such that a higher portion of the variation in -depvar- is explained by individual effect rather than idiosyncratic error, but I'm not sure if this is something worth mentioning.

    I don't know if an argument could be made that the included years don't add anything informative to my results; and hence should not be plugged in among the predictors, i.e. I don't know if my data show any evidence that year has a statistical significant effect on my depvar. Even if there is a statistically significant effect, I would assume that this only supports the argument that I make above, i.e. that there was a recession at this time and that this was effecting health, and that by holding this fixed I can no longer look at the effect of this recession on health?

    Although my number of observations may appear large, my actual sample is only 614 mothers, so I don’t know if my argument on the dangers of over-fitting the model above will hold much water.

  • #2
    You didn't get a quick response. You'll increase your chances of a useful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using dataex. Also, this may be the longest posting I have seen on the forum. It is not likely many will bother to read so much just to help you. Try to condense your question down to the minimum needed to explain the problem.

    If there is regional variation in unemployment (as there must be), then it should show up as influencing health even holding national unemployment rate constant. I'd also suspect there might be other trends in health care that time dummies control for.


    Comment


    • #3
      John:
      I do agree with Phil: your post is really difficult to follow.
      That said:
      - if -parmtest- is statistically significant, it's probably reasonable to use -i.year- as a predictor;
      -the missing chi2 is probably due to panels with singleton;
      - as usual, the model specification should give a fair and true view of the data generating process.
      ​​​
      Kind regards,
      Carlo
      (Stata 18.0 SE)

      Comment


      • #4
        Dear Carlo and Phil,

        Thank you for your responses and apologies for being obtuse.

        My main question here is, could an argument be made that by including years I am over-fitting my model? I am especially concerned as my sample is so small (614 mothers).

        Relatedly, I consider my analysis to be on the effects of the Irish recession on health as mediated by unemployment. The recession is made up of local and national level unemployment. I feel that by including years in the model I am holding years (and thus national unemployment rates) constant, thus I feel that I am no longer considering the effect of the recession on health, but just the effect of local area unemployment, which is only part of the story that I would like to tell.

        I was wondering if this was a good theoretical argument for not including dummies for years.

        Kindest regards,

        John

        Comment


        • #5
          John:
          far from me any idea of obtuseness: you simply wrote a too long post: that's all.
          Sticking to your issue:
          - by including -year- you're assuming that time can contribute to expalin the variation in the dependent variable. I would ski through the literature in your research field and see what others did in the past about this issue;
          - I would also check the model specification (by the way: this kind of model often show quadratic terms) because your between R-sq are pretty low and most of your predictors are not significant (maybe you have too many categorical variables);
          - eventually, I would also check whether a risk of inversal causation exists: it makes sense that bad health can explain being unemployed, but can we rule out that being unemployed has nothing to do with bad health (say, usually worse-off people buy cheap but junk food)?
          Kind regards,
          Carlo
          (Stata 18.0 SE)

          Comment

          Working...
          X