Confidence interval calculations in -regress- vs. other estimation procedures

John Mullahy

Join Date: Dec 2016
Posts: 772

Confidence interval calculations in -regress- vs. other estimation procedures

03 Dec 2021, 09:02

Might anyone have insights into how -regress- produces confidence intervals for its estimated parameters? For at least some other estimation methods the lower and upper terminals of the 95% CI are given by

Code:

beta_hat+se(beta_hat)*invnormal(.025)
and
beta_hat+se(beta_hat)*invnormal(.975)

But that is evidently not how -regress- produces its estimated confidence intervals, even when a vcov estimator like robust is used, as seen in the following example.

Code:

sysuse auto

glm price mpg, vce(robust) link(I) noheader
local gb1=e(b)[1,1]
local gs1=sqrt(e(V)[1,1])
local gcl1=`gb1'+`gs1'*invnormal(.025)
local gcu1=`gb1'+`gs1'*invnormal(.975)
di `gs1'
di `gcl1'
di `gcu1'

reg price mpg, vce(robust) noheader
local rb1=e(b)[1,1]
local rs1=sqrt(e(V)[1,1])
local rcl1=`rb1'+`rs1'*invnormal(.025)
local rcu1=`rb1'+`rs1'*invnormal(.975)
di `rs1'
di `rcl1'
di `rcu1'

which yields the following results

Code:

. glm price mpg, vce(robust) link(I) noheader

Iteration 0:   log pseudolikelihood = -686.53958  

------------------------------------------------------------------------------
             |               Robust
       price | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         mpg |  -238.8943   57.08197    -4.19   0.000     -350.773   -127.0157
       _cons |   11253.06   1366.933     8.23   0.000     8573.922     13932.2
------------------------------------------------------------------------------

. local gb1=e(b)[1,1]

. local gs1=sqrt(e(V)[1,1])

. local gcl1=`gb1'+`gs1'*invnormal(.025)

. local gcu1=`gb1'+`gs1'*invnormal(.975)

. di `gs1'
57.081973

. di `gcl1'
-350.77296

. di `gcu1'
-127.01573

.
. reg price mpg, vce(robust) noheader
------------------------------------------------------------------------------
             |               Robust
       price | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         mpg |  -238.8943   57.47701    -4.16   0.000    -353.4727    -124.316
       _cons |   11253.06   1376.393     8.18   0.000     8509.272    13996.85
------------------------------------------------------------------------------

. local rb1=e(b)[1,1]

. local rs1=sqrt(e(V)[1,1])

. local rcl1=`rb1'+`rs1'*invnormal(.025)

. local rcu1=`rb1'+`rs1'*invnormal(.975)

. di `rs1'
57.477009

. di `rcl1'
-351.54721

. di `rcu1'
-126.24148

I was unable to find anything in the Methods and Formulas section of the documentation that described the specifics (though quite possibly missed something).

It appears as if -regress- is appealing to a t-distribution instead of a normal distribution even though one might argue that vce(robust) should be appealing to a normal distribution as it does for glm as seen in the example.

Thanks for any insights you might pass along.

Last edited by John Mullahy; 03 Dec 2021, 09:15.

Tags: None

Leonardo Guizzetti

Join Date: Jul 2016
Posts: 2458

03 Dec 2021, 09:16

Originally posted by John Mullahy View Post

I'm wondering if -regress- is appealing to a t-distribution instead of a normal distribution (even though it would seem like vce(robust) should be appealing to a normal distribution as it does for glm as seen in the example).

This is the difference as far as a quick observation. -regress- uses the t-distribution while -glm- assumes large sample z-statistics.

Code:

. sysuse auto
(1978 automobile data)

.
. reg price

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(0, 73)        =      0.00
       Model |           0         0           .   Prob > F        =         .
    Residual |   635065396        73  8699525.97   R-squared       =    0.0000
-------------+----------------------------------   Adj R-squared   =    0.0000
       Total |   635065396        73  8699525.97   Root MSE        =    2949.5

------------------------------------------------------------------------------
       price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       _cons |   6165.257   342.8719    17.98   0.000     5481.914      6848.6
------------------------------------------------------------------------------

. mat list e(V)

symmetric e(V)[1,1]
           _cons
_cons  117561.16

. di "mean (95% CI) = " %7.3f _b[_cons] " (" %7.3f _b[_cons] - invt(e(df_r), 0.975)*_se[_cons] ", " %7.3f _b[_cons] + invt(e(df_r), 0.975)*_se[_cons] ")"
mean (95% CI) = 6165.257 (5481.914, 6848.600)

.
. glm price

Iteration 0:   log likelihood = -695.71287  

Generalized linear models                         Number of obs   =         74
Optimization     : ML                             Residual df     =         73
                                                  Scale parameter =    8699526
Deviance         =  635065396.1                   (1/df) Deviance =    8699526
Pearson          =  635065396.1                   (1/df) Pearson  =    8699526

Variance function: V(u) = 1                       [Gaussian]
Link function    : g(u) = u                       [Identity]

                                                  AIC             =   18.83008
Log likelihood   = -695.7128689                   BIC             =   6.35e+08

------------------------------------------------------------------------------
             |                 OIM
       price | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
       _cons |   6165.257   342.8719    17.98   0.000      5493.24    6837.273
------------------------------------------------------------------------------

. mat list e(V)

symmetric e(V)[1,1]
                 price:
                 _cons
price:_cons  117561.16

. di "mean (95% CI) = " %7.3f _b[_cons] " (" %7.3f _b[_cons] - invnormal(0.975)*_se[_cons] ", " %7.3f _b[_cons] + invnormal(0.975)*_se[_cons] ")"
mean (95% CI) = 6165.257 (5493.240, 6837.273)

Announcement

Confidence interval calculations in -regress- vs. other estimation procedures

Comment