Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Inflated standard errors

    Hi Statalist,

    I am running an iv model (using ivreg2 command), however, I am having an issue with some inflated standard errors with three of my variables once I interact them with one another. This is the regression I am running:

    Code:
    ivreg2 p1 (currwork##year##husjob2= c.hhmemtotal_w##year##husjob2 c.avgwork##year##husjob2) i.ehypo i.ehyper dis i.educlvl parentdv presentdv[pw=weight1]
    I am interacting currwork (coded 0, 1), husjob2 (coded 0, 1, 2) and year (2005, 2014)

    Code:
                                                     Number of obs =    11319
                                                          F( 19, 11299) =     6.12
                                                          Prob > F      =   0.0000
    Total (centered) SS     =  34741.31888                Centered R2   =  -1.0469
    Total (uncentered) SS   =  34795.53403                Uncentered R2 =  -1.0437
    Residual SS             =   71110.3302                Root MSE      =    2.506
    
    -------------------------------------------------------------------------------------------
                              |               Robust
                           p1 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    --------------------------+----------------------------------------------------------------
                     currwork |
                         yes  |     1.2743   2.616776     0.49   0.626    -3.854487    6.403086
                              |
                         year |
                        2014  |  -2.245415   2.987564    -0.75   0.452    -8.100933    3.610104
                              |
                currwork#year |
                    yes#2014  |   10.01232   14.13885     0.71   0.479    -17.69932    37.72395
                              |
                      husjob2 |
                  Unemployed  |  -2.712398   9.411266    -0.29   0.773    -21.15814    15.73334
             Blue collar job  |   .3374251   .5287345     0.64   0.523    -.6988755    1.373726
                              |
             currwork#husjob2 |
              yes#Unemployed  |    11.3041   45.05691     0.25   0.802    -77.00583    99.61402
         yes#Blue collar job  |  -3.407791   2.756592    -1.24   0.216    -8.810613     1.99503
                              |
                 year#husjob2 |
             2014#Unemployed  |   5.141164   8.840584     0.58   0.561    -12.18606    22.46839
        2014#Blue collar job  |   1.794166   3.531053     0.51   0.611    -5.126571    8.714903
                              |
        currwork#year#husjob2 |
         yes#2014#Unemployed  |  -27.07516   42.27643    -0.64   0.522    -109.9354    55.78513
    yes#2014#Blue collar job  |  -7.513584   19.54885    -0.38   0.701    -45.82862    30.80145
                              |
                      1.ehypo |   .3597295   .2507223     1.43   0.151    -.1316773    .8511363
                     1.ehyper |  -.2413518   .1325384    -1.82   0.069    -.5011222    .0184186
                          dis |   .0613033   .1616233     0.38   0.704    -.2554726    .3780792
                              |
                      educlvl |
                     primary  |  -.2648927   .2085439    -1.27   0.204    -.6736311    .1438457
                   secondary  |  -.8432629   .1663115    -5.07   0.000    -1.169227   -.5172982
                      higher  |  -2.341019   1.485547    -1.58   0.115    -5.252638    .5706004
                              |
                     parentdv |     .93121   .1608053     5.79   0.000     .6160374    1.246382
                    presentdv |  -.3447354   .2423038    -1.42   0.155    -.8196422    .1301713
                        _cons |   .4656507   .5470787     0.85   0.395    -.6066039    1.537905
    -------------------------------------------------------------------------------------------

    I have checked cell sizes and I do indeed have small cells when I tabulate my currwork and husjob2 variable by year.
    Code:
                 ta husjob2 currwork if year==2005
    
                    |   currently working
            husjob2 |        no        yes |     Total
    ----------------+----------------------+----------
       White collar |     1,206        594 |     1,800 
         Unemployed |       182         46 |       228 
    Blue collar job |     2,534        526 |     3,060 
    ----------------+----------------------+----------
              Total |     3,922      1,166 |     5,088 
    
    . ta husjob2 currwork if year==2014
    
                    |   currently working
            husjob2 |        no        yes |     Total
    ----------------+----------------------+----------
       White collar |     1,617        488 |     2,105 
         Unemployed |       146         24 |       170 
    Blue collar job |     3,539        417 |     3,956 
    ----------------+----------------------+----------
              Total |     5,302        929 |     6,231 
    
    .
    Attempting to combine categories in husjob2 does seem to lower the standard errors but they still remain high (>10). Can any advice on what the best course of action would be in this case?


  • #2
    You're asking for a lot in estimating all those interactions particularly for sub-groups with few oibservations. High standard errors are not something to fix in the statistics - they are what they are. You might check for outliers.

    Comment


    • #3
      Lara:
      as an aside to Phil's helpful comment, have you already ruled out quasi-estreme multicollinearity issues?
      Kind regards,
      Carlo
      (Stata 16.0 SE)

      Comment


      • #4
        Thank you both for your reply.

        Carlo- I have checked for multicollinearity (all VIF values are below 5) so I do not think that is what's causing it.

        Comment


        • #5
          Lara:
          overfitting?
          Kind regards,
          Carlo
          (Stata 16.0 SE)

          Comment


          • #6
            I think that might be the case - I am aware of the ''overfit'' stata command which calculates shrinkage statistics, however I have been trying to generate these statistics and I keep running into the following error:

            Code:
            overfit: p1 i.currwork i.hhmemtotal avgwork i.husjob2 i.ehypo i.ehyper dis i.educlvl parentdv presentdv
            ''Warning: 1200 crashes have occurred when estimating the model or the shrinkage statistics for one or more iterations.
            See matrix r(crashes) for detail''

            I am wondering if there are any other ways to detect overfitting?

            Comment


            • #7
              Lara:
              some months ago Bruce Weaver was kind enough to share the link of this useful reference https://www.cs.vu.nl/~eliens/sg/loca...verfitting.pdf.
              Kind regards,
              Carlo
              (Stata 16.0 SE)

              Comment


              • #8
                Thank you Carlo - I will have a read.

                Comment

                Working...
                X