Inflated standard errors

Lara ingram

Join Date: Aug 2019
Posts: 36

Inflated standard errors

13 Dec 2019, 05:37

Hi Statalist,

I am running an iv model (using ivreg2 command), however, I am having an issue with some inflated standard errors with three of my variables once I interact them with one another. This is the regression I am running:

Code:

ivreg2 p1 (currwork##year##husjob2= c.hhmemtotal_w##year##husjob2 c.avgwork##year##husjob2) i.ehypo i.ehyper dis i.educlvl parentdv presentdv[pw=weight1]

I am interacting currwork (coded 0, 1), husjob2 (coded 0, 1, 2) and year (2005, 2014)

Code:

                                                 Number of obs =    11319
                                                      F( 19, 11299) =     6.12
                                                      Prob > F      =   0.0000
Total (centered) SS     =  34741.31888                Centered R2   =  -1.0469
Total (uncentered) SS   =  34795.53403                Uncentered R2 =  -1.0437
Residual SS             =   71110.3302                Root MSE      =    2.506

-------------------------------------------------------------------------------------------
                          |               Robust
                       p1 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------------------+----------------------------------------------------------------
                 currwork |
                     yes  |     1.2743   2.616776     0.49   0.626    -3.854487    6.403086
                          |
                     year |
                    2014  |  -2.245415   2.987564    -0.75   0.452    -8.100933    3.610104
                          |
            currwork#year |
                yes#2014  |   10.01232   14.13885     0.71   0.479    -17.69932    37.72395
                          |
                  husjob2 |
              Unemployed  |  -2.712398   9.411266    -0.29   0.773    -21.15814    15.73334
         Blue collar job  |   .3374251   .5287345     0.64   0.523    -.6988755    1.373726
                          |
         currwork#husjob2 |
          yes#Unemployed  |    11.3041   45.05691     0.25   0.802    -77.00583    99.61402
     yes#Blue collar job  |  -3.407791   2.756592    -1.24   0.216    -8.810613     1.99503
                          |
             year#husjob2 |
         2014#Unemployed  |   5.141164   8.840584     0.58   0.561    -12.18606    22.46839
    2014#Blue collar job  |   1.794166   3.531053     0.51   0.611    -5.126571    8.714903
                          |
    currwork#year#husjob2 |
     yes#2014#Unemployed  |  -27.07516   42.27643    -0.64   0.522    -109.9354    55.78513
yes#2014#Blue collar job  |  -7.513584   19.54885    -0.38   0.701    -45.82862    30.80145
                          |
                  1.ehypo |   .3597295   .2507223     1.43   0.151    -.1316773    .8511363
                 1.ehyper |  -.2413518   .1325384    -1.82   0.069    -.5011222    .0184186
                      dis |   .0613033   .1616233     0.38   0.704    -.2554726    .3780792
                          |
                  educlvl |
                 primary  |  -.2648927   .2085439    -1.27   0.204    -.6736311    .1438457
               secondary  |  -.8432629   .1663115    -5.07   0.000    -1.169227   -.5172982
                  higher  |  -2.341019   1.485547    -1.58   0.115    -5.252638    .5706004
                          |
                 parentdv |     .93121   .1608053     5.79   0.000     .6160374    1.246382
                presentdv |  -.3447354   .2423038    -1.42   0.155    -.8196422    .1301713
                    _cons |   .4656507   .5470787     0.85   0.395    -.6066039    1.537905
-------------------------------------------------------------------------------------------

I have checked cell sizes and I do indeed have small cells when I tabulate my currwork and husjob2 variable by year.

Code:

             ta husjob2 currwork if year==2005

                |   currently working
        husjob2 |        no        yes |     Total
----------------+----------------------+----------
   White collar |     1,206        594 |     1,800 
     Unemployed |       182         46 |       228 
Blue collar job |     2,534        526 |     3,060 
----------------+----------------------+----------
          Total |     3,922      1,166 |     5,088 

. ta husjob2 currwork if year==2014

                |   currently working
        husjob2 |        no        yes |     Total
----------------+----------------------+----------
   White collar |     1,617        488 |     2,105 
     Unemployed |       146         24 |       170 
Blue collar job |     3,539        417 |     3,956 
----------------+----------------------+----------
          Total |     5,302        929 |     6,231 

.

Attempting to combine categories in husjob2 does seem to lower the standard errors but they still remain high (>10). Can any advice on what the best course of action would be in this case?

Tags: data, regression, standard error

Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#2

16 Dec 2019, 11:01

You're asking for a lot in estimating all those interactions particularly for sub-groups with few oibservations. High standard errors are not something to fix in the statistics - they are what they are. You might check for outliers.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17851
#3

16 Dec 2019, 11:08

Lara:
as an aside to Phil's helpful comment, have you already ruled out quasi-estreme multicollinearity issues?

Kind regards,
Carlo
(Stata 19.0)
Comment
Lara ingram

Join Date: Aug 2019

Posts: 36
#4

19 Dec 2019, 11:20

Thank you both for your reply.

Carlo- I have checked for multicollinearity (all VIF values are below 5) so I do not think that is what's causing it.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17851
#5

19 Dec 2019, 12:05

Lara:
overfitting?

Kind regards,
Carlo
(Stata 19.0)
Comment
Lara ingram

Join Date: Aug 2019

Posts: 36
#6

20 Dec 2019, 05:27

I think that might be the case - I am aware of the ''overfit'' stata command which calculates shrinkage statistics, however I have been trying to generate these statistics and I keep running into the following error:

Code:

overfit: p1 i.currwork i.hhmemtotal avgwork i.husjob2 i.ehypo i.ehyper dis i.educlvl parentdv presentdv

''Warning: 1200 crashes have occurred when estimating the model or the shrinkage statistics for one or more iterations.
See matrix r(crashes) for detail''

I am wondering if there are any other ways to detect overfitting?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17851
#7

20 Dec 2019, 08:35

Lara:
some months ago Bruce Weaver was kind enough to share the link of this useful reference https://www.cs.vu.nl/~eliens/sg/loca...verfitting.pdf.

Kind regards,
Carlo
(Stata 19.0)
Comment
Lara ingram

Join Date: Aug 2019

Posts: 36
#8

21 Dec 2019, 04:00

Thank you Carlo - I will have a read.
Comment

Announcement

Inflated standard errors

Comment

Comment

Comment

Comment

Comment

Comment

Comment