Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Confidence interval calculations in -regress- vs. other estimation procedures

    Might anyone have insights into how -regress- produces confidence intervals for its estimated parameters? For at least some other estimation methods the lower and upper terminals of the 95% CI are given by
    Code:
    beta_hat+se(beta_hat)*invnormal(.025)
    and
    beta_hat+se(beta_hat)*invnormal(.975)
    But that is evidently not how -regress- produces its estimated confidence intervals, even when a vcov estimator like robust is used, as seen in the following example.
    Code:
    sysuse auto
    
    glm price mpg, vce(robust) link(I) noheader
    local gb1=e(b)[1,1]
    local gs1=sqrt(e(V)[1,1])
    local gcl1=`gb1'+`gs1'*invnormal(.025)
    local gcu1=`gb1'+`gs1'*invnormal(.975)
    di `gs1'
    di `gcl1'
    di `gcu1'
    
    reg price mpg, vce(robust) noheader
    local rb1=e(b)[1,1]
    local rs1=sqrt(e(V)[1,1])
    local rcl1=`rb1'+`rs1'*invnormal(.025)
    local rcu1=`rb1'+`rs1'*invnormal(.975)
    di `rs1'
    di `rcl1'
    di `rcu1'
    which yields the following results
    Code:
    . glm price mpg, vce(robust) link(I) noheader
    
    Iteration 0:   log pseudolikelihood = -686.53958  
    
    ------------------------------------------------------------------------------
                 |               Robust
           price | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             mpg |  -238.8943   57.08197    -4.19   0.000     -350.773   -127.0157
           _cons |   11253.06   1366.933     8.23   0.000     8573.922     13932.2
    ------------------------------------------------------------------------------
    
    . local gb1=e(b)[1,1]
    
    . local gs1=sqrt(e(V)[1,1])
    
    . local gcl1=`gb1'+`gs1'*invnormal(.025)
    
    . local gcu1=`gb1'+`gs1'*invnormal(.975)
    
    . di `gs1'
    57.081973
    
    . di `gcl1'
    -350.77296
    
    . di `gcu1'
    -127.01573
    
    .
    . reg price mpg, vce(robust) noheader
    ------------------------------------------------------------------------------
                 |               Robust
           price | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             mpg |  -238.8943   57.47701    -4.16   0.000    -353.4727    -124.316
           _cons |   11253.06   1376.393     8.18   0.000     8509.272    13996.85
    ------------------------------------------------------------------------------
    
    . local rb1=e(b)[1,1]
    
    . local rs1=sqrt(e(V)[1,1])
    
    . local rcl1=`rb1'+`rs1'*invnormal(.025)
    
    . local rcu1=`rb1'+`rs1'*invnormal(.975)
    
    . di `rs1'
    57.477009
    
    . di `rcl1'
    -351.54721
    
    . di `rcu1'
    -126.24148
    I was unable to find anything in the Methods and Formulas section of the documentation that described the specifics (though quite possibly missed something).

    It appears as if -regress- is appealing to a t-distribution instead of a normal distribution even though one might argue that vce(robust) should be appealing to a normal distribution as it does for glm as seen in the example.

    Thanks for any insights you might pass along.
    Last edited by John Mullahy; 03 Dec 2021, 09:15.

  • #2
    Originally posted by John Mullahy View Post
    I'm wondering if -regress- is appealing to a t-distribution instead of a normal distribution (even though it would seem like vce(robust) should be appealing to a normal distribution as it does for glm as seen in the example).
    This is the difference as far as a quick observation. -regress- uses the t-distribution while -glm- assumes large sample z-statistics.

    Code:
    . sysuse auto
    (1978 automobile data)
    
    .
    . reg price
    
          Source |       SS           df       MS      Number of obs   =        74
    -------------+----------------------------------   F(0, 73)        =      0.00
           Model |           0         0           .   Prob > F        =         .
        Residual |   635065396        73  8699525.97   R-squared       =    0.0000
    -------------+----------------------------------   Adj R-squared   =    0.0000
           Total |   635065396        73  8699525.97   Root MSE        =    2949.5
    
    ------------------------------------------------------------------------------
           price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
           _cons |   6165.257   342.8719    17.98   0.000     5481.914      6848.6
    ------------------------------------------------------------------------------
    
    . mat list e(V)
    
    symmetric e(V)[1,1]
               _cons
    _cons  117561.16
    
    . di "mean (95% CI) = " %7.3f _b[_cons] " (" %7.3f _b[_cons] - invt(e(df_r), 0.975)*_se[_cons] ", " %7.3f _b[_cons] + invt(e(df_r), 0.975)*_se[_cons] ")"
    mean (95% CI) = 6165.257 (5481.914, 6848.600)
    
    .
    . glm price
    
    Iteration 0:   log likelihood = -695.71287  
    
    Generalized linear models                         Number of obs   =         74
    Optimization     : ML                             Residual df     =         73
                                                      Scale parameter =    8699526
    Deviance         =  635065396.1                   (1/df) Deviance =    8699526
    Pearson          =  635065396.1                   (1/df) Pearson  =    8699526
    
    Variance function: V(u) = 1                       [Gaussian]
    Link function    : g(u) = u                       [Identity]
    
                                                      AIC             =   18.83008
    Log likelihood   = -695.7128689                   BIC             =   6.35e+08
    
    ------------------------------------------------------------------------------
                 |                 OIM
           price | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
           _cons |   6165.257   342.8719    17.98   0.000      5493.24    6837.273
    ------------------------------------------------------------------------------
    
    . mat list e(V)
    
    symmetric e(V)[1,1]
                     price:
                     _cons
    price:_cons  117561.16
    
    . di "mean (95% CI) = " %7.3f _b[_cons] " (" %7.3f _b[_cons] - invnormal(0.975)*_se[_cons] ", " %7.3f _b[_cons] + invnormal(0.975)*_se[_cons] ")"
    mean (95% CI) = 6165.257 (5493.240, 6837.273)

    Comment

    Working...
    X