Insignificant year dummies

Oliver Gatland

Join Date: Mar 2019
Posts: 10

Insignificant year dummies

12 Apr 2019, 05:48

Hi all,

I am running a regression for my dissertation and was planning on including year dummies. The output I get is as follows:

Code:

 reg newenrol CPI newmarriage depend newedspend newmortality newfemteach urban lnGDP i.year

      Source |       SS           df       MS      Number of obs   =     1,017
-------------+----------------------------------   F(25, 991)      =    284.34
       Model |  575642.157        25  23025.6863   Prob > F        =    0.0000
    Residual |  80251.9092       991  80.9807358   R-squared       =    0.8776
-------------+----------------------------------   Adj R-squared   =    0.8746
       Total |  655894.067     1,016  645.565026   Root MSE        =    8.9989

------------------------------------------------------------------------------
    newenrol |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         CPI |   .0111325   .0296963     0.37   0.708    -.0471423    .0694074
 newmarriage |   .0644005   .0702137     0.92   0.359     -.073384     .202185
      depend |  -.2866934   .0331114    -8.66   0.000      -.35167   -.2217168
  newedspend |   1.322696   .2283656     5.79   0.000     .8745602    1.770832
newmortality |   -.567749   .0362517   -15.66   0.000    -.6388879     -.49661
 newfemteach |   .2620551   .0224073    11.70   0.000     .2180839    .3060264
       urban |   .0612197   .0238815     2.56   0.011     .0143556    .1080839
       lnGDP |   1.565205   .6738355     2.32   0.020     .2428971    2.887514
             |
        year |
       2001  |   .2016182   1.725029     0.12   0.907    -3.183511    3.586748
       2002  |  -.2950407   1.717641    -0.17   0.864    -3.665671     3.07559
       2003  |   .1552302   1.709801     0.09   0.928    -3.200016    3.510476
       2004  |   .0421186   1.710136     0.02   0.980    -3.313785    3.398022
       2005  |    .010656    1.70356     0.01   0.995    -3.332343    3.353655
       2006  |  -.1910212   1.703714    -0.11   0.911    -3.534322     3.15228
       2007  |  -.3678111   1.704711    -0.22   0.829    -3.713068    2.977446
       2008  |  -.7939854   1.705902    -0.47   0.642    -4.141581     2.55361
       2009  |  -1.582709   1.711864    -0.92   0.355    -4.942003    1.776585
       2010  |  -1.762984   1.711649    -1.03   0.303    -5.121857    1.595889
       2011  |  -2.007432   1.713243    -1.17   0.242    -5.369433    1.354569
       2012  |  -1.745967   1.712873    -1.02   0.308    -5.107242    1.615308
       2013  |  -.9419901   1.715142    -0.55   0.583    -4.307717    2.423737
       2014  |   -.540971   1.716269    -0.32   0.753     -3.90891    2.826968
       2015  |  -.1426261   1.717701    -0.08   0.934    -3.513374    3.228122
       2016  |   .3639826   1.718525     0.21   0.832    -3.008383    3.736349
       2017  |   .2700282   1.719346     0.16   0.875    -3.103948    3.644005
             |
       _cons |   60.52972    5.88975    10.28   0.000     48.97191    72.08754

All of the year dummies are insignificant! When I run the regression without the year dummies I get similar results for the other variables, the output is as follows:

Code:

reg newenrol CPI newmarriage depend newedspend newmortality newfemteach urban lnGDP

      Source |       SS           df       MS      Number of obs   =     1,017
-------------+----------------------------------   F(8, 1008)      =    896.51
       Model |  575070.577         8  71883.8221   Prob > F        =    0.0000
    Residual |  80823.4898     1,008  80.1820335   R-squared       =    0.8768
-------------+----------------------------------   Adj R-squared   =    0.8758
       Total |  655894.067     1,016  645.565026   Root MSE        =    8.9544

------------------------------------------------------------------------------
    newenrol |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         CPI |   .0136373   .0294484     0.46   0.643    -.0441498    .0714244
 newmarriage |   .0686673   .0697638     0.98   0.325    -.0682315    .2055662
      depend |  -.2826268   .0328785    -8.60   0.000    -.3471448   -.2181087
  newedspend |   1.273113   .2259444     5.63   0.000     .8297378    1.716488
newmortality |  -.5659455   .0353495   -16.01   0.000    -.6353126   -.4965785
 newfemteach |   .2636893   .0222034    11.88   0.000     .2201191    .3072595
       urban |   .0624549   .0237473     2.63   0.009      .015855    .1090548
       lnGDP |   1.567741   .6653732     2.36   0.019     .2620653    2.873416
       _cons |   59.56112   5.612467    10.61   0.000     48.54767    70.57458

Should I leave the year dummies in, or just say I tried doing the regression with them and none were significant so I removed them? I'm not sure if there is a standard practise for this kind of thing?

Thanks very much

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30174
#2

12 Apr 2019, 10:17

Staying close to your current frame of reference, I'll begin by pointing out that the statistical significance of the individual year indicators ("dummies") is irrelevant to the decision of whether to include them in the model. If you want to rely on a significance test, you should use the joint significance of all of them. Run -testparm i.year-.

Moving a bit away from your current frame of reference, I am from the school who do not endorse the use of any kind of statistical significance test for deciding what to include in a model. I would rely instead on judgment based on: a) does theory suggest that there are material yearly shocks to the outcome? If so, include year indicators. b) How do the magnitudes of the year indicators compare to other coefficients in the model. Are they just "rounding errors" or do some or all of them add a meaningful amount to the outcome variable? c) Is there a meaningful change in R² when you include them? (In the outputs you show it looks like there isn't.) d) Is there "room" for this many variables in the model or will it leave you with too few observations per variable? (In this case it looks like you have enough observations to allow for the additional indicators.)

Moving far from your current frame of reference, I'll point out that the American Statistical Association has recently adopted a position paper recommending that the concept of statistical significance as a whole be abandoned. Read https://www.tandfonline.com/doi/full...5.2019.1583913 if you have time. For a shorter "pep talk" on the same topic, see https://www.nature.com/articles/d41586-019-00857-9.
Comment

Announcement

Insignificant year dummies

Comment