Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Insignificant year dummies

    Hi all,

    I am running a regression for my dissertation and was planning on including year dummies. The output I get is as follows:

    Code:
     reg newenrol CPI newmarriage depend newedspend newmortality newfemteach urban lnGDP i.year
    
          Source |       SS           df       MS      Number of obs   =     1,017
    -------------+----------------------------------   F(25, 991)      =    284.34
           Model |  575642.157        25  23025.6863   Prob > F        =    0.0000
        Residual |  80251.9092       991  80.9807358   R-squared       =    0.8776
    -------------+----------------------------------   Adj R-squared   =    0.8746
           Total |  655894.067     1,016  645.565026   Root MSE        =    8.9989
    
    ------------------------------------------------------------------------------
        newenrol |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             CPI |   .0111325   .0296963     0.37   0.708    -.0471423    .0694074
     newmarriage |   .0644005   .0702137     0.92   0.359     -.073384     .202185
          depend |  -.2866934   .0331114    -8.66   0.000      -.35167   -.2217168
      newedspend |   1.322696   .2283656     5.79   0.000     .8745602    1.770832
    newmortality |   -.567749   .0362517   -15.66   0.000    -.6388879     -.49661
     newfemteach |   .2620551   .0224073    11.70   0.000     .2180839    .3060264
           urban |   .0612197   .0238815     2.56   0.011     .0143556    .1080839
           lnGDP |   1.565205   .6738355     2.32   0.020     .2428971    2.887514
                 |
            year |
           2001  |   .2016182   1.725029     0.12   0.907    -3.183511    3.586748
           2002  |  -.2950407   1.717641    -0.17   0.864    -3.665671     3.07559
           2003  |   .1552302   1.709801     0.09   0.928    -3.200016    3.510476
           2004  |   .0421186   1.710136     0.02   0.980    -3.313785    3.398022
           2005  |    .010656    1.70356     0.01   0.995    -3.332343    3.353655
           2006  |  -.1910212   1.703714    -0.11   0.911    -3.534322     3.15228
           2007  |  -.3678111   1.704711    -0.22   0.829    -3.713068    2.977446
           2008  |  -.7939854   1.705902    -0.47   0.642    -4.141581     2.55361
           2009  |  -1.582709   1.711864    -0.92   0.355    -4.942003    1.776585
           2010  |  -1.762984   1.711649    -1.03   0.303    -5.121857    1.595889
           2011  |  -2.007432   1.713243    -1.17   0.242    -5.369433    1.354569
           2012  |  -1.745967   1.712873    -1.02   0.308    -5.107242    1.615308
           2013  |  -.9419901   1.715142    -0.55   0.583    -4.307717    2.423737
           2014  |   -.540971   1.716269    -0.32   0.753     -3.90891    2.826968
           2015  |  -.1426261   1.717701    -0.08   0.934    -3.513374    3.228122
           2016  |   .3639826   1.718525     0.21   0.832    -3.008383    3.736349
           2017  |   .2700282   1.719346     0.16   0.875    -3.103948    3.644005
                 |
           _cons |   60.52972    5.88975    10.28   0.000     48.97191    72.08754
    All of the year dummies are insignificant! When I run the regression without the year dummies I get similar results for the other variables, the output is as follows:

    Code:
    reg newenrol CPI newmarriage depend newedspend newmortality newfemteach urban lnGDP
    
          Source |       SS           df       MS      Number of obs   =     1,017
    -------------+----------------------------------   F(8, 1008)      =    896.51
           Model |  575070.577         8  71883.8221   Prob > F        =    0.0000
        Residual |  80823.4898     1,008  80.1820335   R-squared       =    0.8768
    -------------+----------------------------------   Adj R-squared   =    0.8758
           Total |  655894.067     1,016  645.565026   Root MSE        =    8.9544
    
    ------------------------------------------------------------------------------
        newenrol |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             CPI |   .0136373   .0294484     0.46   0.643    -.0441498    .0714244
     newmarriage |   .0686673   .0697638     0.98   0.325    -.0682315    .2055662
          depend |  -.2826268   .0328785    -8.60   0.000    -.3471448   -.2181087
      newedspend |   1.273113   .2259444     5.63   0.000     .8297378    1.716488
    newmortality |  -.5659455   .0353495   -16.01   0.000    -.6353126   -.4965785
     newfemteach |   .2636893   .0222034    11.88   0.000     .2201191    .3072595
           urban |   .0624549   .0237473     2.63   0.009      .015855    .1090548
           lnGDP |   1.567741   .6653732     2.36   0.019     .2620653    2.873416
           _cons |   59.56112   5.612467    10.61   0.000     48.54767    70.57458

    Should I leave the year dummies in, or just say I tried doing the regression with them and none were significant so I removed them? I'm not sure if there is a standard practise for this kind of thing?

    Thanks very much

  • #2
    Staying close to your current frame of reference, I'll begin by pointing out that the statistical significance of the individual year indicators ("dummies") is irrelevant to the decision of whether to include them in the model. If you want to rely on a significance test, you should use the joint significance of all of them. Run -testparm i.year-.

    Moving a bit away from your current frame of reference, I am from the school who do not endorse the use of any kind of statistical significance test for deciding what to include in a model. I would rely instead on judgment based on: a) does theory suggest that there are material yearly shocks to the outcome? If so, include year indicators. b) How do the magnitudes of the year indicators compare to other coefficients in the model. Are they just "rounding errors" or do some or all of them add a meaningful amount to the outcome variable? c) Is there a meaningful change in R2 when you include them? (In the outputs you show it looks like there isn't.) d) Is there "room" for this many variables in the model or will it leave you with too few observations per variable? (In this case it looks like you have enough observations to allow for the additional indicators.)

    Moving far from your current frame of reference, I'll point out that the American Statistical Association has recently adopted a position paper recommending that the concept of statistical significance as a whole be abandoned. Read https://www.tandfonline.com/doi/full...5.2019.1583913 if you have time. For a shorter "pep talk" on the same topic, see https://www.nature.com/articles/d41586-019-00857-9.

    Comment

    Working...
    X