How to manually calculate the SE of prediction for each observation

David Silverstein

Join Date: Feb 2015

Posts: 14
#1

How to manually calculate the SE of prediction for each observation

28 Apr 2015, 05:27

Dear Statalists,

The following syntax gives us the standard error of prediction for each observation.

Code:

ologit y x1 predict stdp, stdp

But how can we replicate this stdp by hand?

For example, if the estimated results are as follows:

y Coef. Std.Err. z P>|z| 95%CI
x1 0.501 0.012 40.99 0.000 0.477 0.525
cut1 2.635 0.090
cut2 4.482 0.092
cut3 5.378 0.094
cut4 6.219 0.097

We can replicate

Code:

predict xb, xb

by typing

Code:

gen xb_replicate = 0.501 * x1

Can we do such a thing with stdp? I mean, can we calculate manually stdp for each case?

Thank you in advance,
DS

Last edited by David Silverstein; 28 Apr 2015, 05:29.
Tags: None

Scott Merryman

Join Date: Mar 2014
Posts: 895

28 Apr 2015, 06:45

Code:

. webuse fullauto,clear
(Automobile Models)

. qui ologit rep77 mpg

. predict stdp , stdp

. keep rep77 mpg st

. mat V = e(V)

. scalar V = V[1,1]

. gen mystdp = sqrt(mpg*V*mpg)

. l in 1/10

     +-------------------------------------+
     | mpg     rep77       stdp     mystdp |
     |-------------------------------------|
  1. |  22      Fair    .827769    .827769 |
  2. |  17      Poor   .6396397   .6396397 |
  3. |  22         .    .827769    .827769 |
  4. |  23   Average   .8653949   .8653949 |
  5. |  17      Fair   .6396397   .6396397 |
     |-------------------------------------|
  6. |  25      Good   .9406466   .9406466 |
  7. |  20   Average   .7525173   .7525173 |
  8. |  15      Good    .564388    .564388 |
  9. |  18      Good   .6772656   .6772656 |
 10. |  26         .   .9782725   .9782725 |
     +-------------------------------------+

Comment

David Silverstein

Join Date: Feb 2015

Posts: 14
#3

29 Apr 2015, 08:23

Dear Scott,

I really appreciate your helpful answer!!
Sorry for the late reply because I needed a whole day to understand the syntaxes.
Could I ask one more question? What if the equation has two or more explanatory variables? Can I do the same procedure with ologit y x1 x2, for example?

Best,
DS
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10213

29 Apr 2015, 09:12

Why don't you give it a try and see? It takes you back to your school days... how do we calculate variance? Using Scott's example, let us try the 2 variable case

Code:

webuse fullauto,clear
qui ologit rep77 mpg price
predict stdp , stdp
keep rep77 mpg price st
mat V = e(V)

Before we proceed, let us examine this matrix e(V) to make sure that you understand Scott's code above

Code:

. mat list V

symmetric V[6,6]
                 rep77:     rep77:      cut1:      cut2:      cut3:      cut4:
                   mpg      price      _cons      _cons      _cons      _cons
  rep77:mpg  .00182997
rep77:price  1.723e-06  7.580e-09
 cut1:_cons  .04692561  .00007947  1.7638211
 cut2:_cons  .04793381  .00008122  1.5159417  1.5676695
 cut3:_cons  .05079184  .00008574  1.5500759  1.5896552  1.7202349
 cut4:_cons  .05452492  .00008838  1.6314346  1.6703903  1.7908583  2.0788882

Now what is Scott doing when he says scalar V= V[1,1] ?
V is a 6x6 matrix, so he is telling Stata to pick up the diagonal element in the top left hand corner, i.e., .00182997, which is the variance of mpg. Now, with two variables, we have two variances (for mpg, and for price). Now back to your school days:

Var(aX +bY) = a^2(var X)+ b^2(var Y) + 2ab(cov X,Y) for some random variables X and Y. .The V[2,1] element is the covariance. So implement this logic and voila!

Code:

scalar V1 = V[1,1]
scalar V2 = V[2,2]
scalar V3 = V[2,1]
gen var = (2* price* mpg*V3)+ (price*price*V2)+ (mpg*mpg*V1)
gen mystd= sqrt(var)

. l in 1/10

     +--------------------------------------------------------+
     | price   mpg     rep77       stdp        var      mystd |
     |--------------------------------------------------------|
  1. |  4099    22      Fair   1.150601   1.323883   1.150601 |
  2. |  4749    17      Poor   .9889793   .9780801   .9889793 |
  3. |  3799    22         .   1.132773   1.283174   1.132773 |
  4. |  6295    23   Average   1.329461   1.767465   1.329461 |
  5. |  9690    17      Fair   1.344764   1.808389   1.344764 |
     |--------------------------------------------------------|
  6. |  9735    25      Good   1.643457    2.70095   1.643457 |
  7. |  4816    20   Average   1.113458   1.239788   1.113458 |
  8. |  7827    15      Good   1.131716   1.280782   1.131716 |
  9. |  5788    18      Good   1.098155   1.205945   1.098155 |
 10. |  4453    26         .   1.336571   1.786421   1.336571 |
     +--------------------------------------------------------+

Comment

David Silverstein

Join Date: Feb 2015
Posts: 14

30 Apr 2015, 00:20

Dear Andrew,

Thank you for your reply!

Allow me to make sure one more thing.
In the variance-covariance matrix,

(from my understanding)
V[1,1] = .00182997 is the square of s.e. for mpg (.0427781)
V[2,2] = 7.580e-09 is the square of s.e. for price (.0000871)
In a similar way, can I derive the covariance V[2,1] = 1.723e-06 from the estimated results (ologit rep77 mpg price)?

Best wishes,
DS

Code:

. ologit rep77 mpg price

Iteration 0:   log likelihood = -89.895098  
Iteration 1:   log likelihood =  -86.52481  
Iteration 2:   log likelihood = -86.491281  
Iteration 3:   log likelihood = -86.491244  
Iteration 4:   log likelihood = -86.491244  

Ordered logistic regression                       Number of obs   =         66
                                                  LR chi2(2)      =       6.81
                                                  Prob > chi2     =     0.0332
Log likelihood = -86.491244                       Pseudo R2       =     0.0379

------------------------------------------------------------------------------
       rep77 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |    .107125   .0427781     2.50   0.012     .0232814    .1909686
       price |   .0001329   .0000871     1.53   0.127    -.0000378    .0003035
-------------+----------------------------------------------------------------
       /cut1 |  -.0656913   1.328089                     -2.668698    2.537316
       /cut2 |   1.722323   1.252066                     -.7316817    4.176327
       /cut3 |   3.662575   1.311577                      1.091931    6.233219
       /cut4 |   5.797839   1.441835                      2.971894    8.623784
------------------------------------------------------------------------------

Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10213

30 Apr 2015, 01:11

Absolutely. Recall Cov(X, Y) =ρX,Y σX,σY (where rho is the correlation coefficient and sigma is the standard deviation)

Code:


. webuse fullauto,clear
(Automobile Models)

. ologit rep77 mpg price

Iteration 0:   log likelihood = -89.895098  
Iteration 1:   log likelihood =  -86.52481  
Iteration 2:   log likelihood = -86.491281  
Iteration 3:   log likelihood = -86.491244  
Iteration 4:   log likelihood = -86.491244  

Ordered logistic regression                       Number of obs   =         66
                                                  LR chi2(2)      =       6.81
                                                  Prob > chi2     =     0.0332
Log likelihood = -86.491244                       Pseudo R2       =     0.0379

------------------------------------------------------------------------------
       rep77 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |    .107125   .0427781     2.50   0.012     .0232814    .1909686
       price |   .0001329   .0000871     1.53   0.127    -.0000378    .0003035
-------------+----------------------------------------------------------------
       /cut1 |  -.0656913   1.328089                     -2.668698    2.537316
       /cut2 |   1.722323   1.252066                     -.7316817    4.176327
       /cut3 |   3.662575   1.311577                      1.091931    6.233219
       /cut4 |   5.797839   1.441835                      2.971894    8.623784
------------------------------------------------------------------------------

. corr mpg price
(obs=74)

             |      mpg    price
-------------+------------------
         mpg |   1.0000
       price |  -0.4594   1.0000


. scalar cov=  -0.4594*.0427781* .0000871

. di cov
-1.712e-06

.

Comment

David Silverstein

Join Date: Feb 2015

Posts: 14
#7

30 Apr 2015, 02:43

Thank you again, Andrew! I really appreciate your help.
Comment
Aishwarya Nahata

Join Date: Feb 2019

Posts: 7
#8

06 Feb 2019, 09:51

Dear Statalist,

I am working on my thesis assessing the impact of fta. I see in some do-files that they have command-

scalar sigma = (followed by a number). How does one decide what the number is and what is this command for?

Many thanks!
Comment
Dimitriy V. Masterov

Join Date: Mar 2014

Posts: 609
#9

06 Feb 2019, 21:14

Originally posted by Aishwarya Nahata View Post

Dear Statalist,

I am working on my thesis assessing the impact of fta. I see in some do-files that they have command-

scalar sigma = (followed by a number). How does one decide what the number is and what is this command for?

Many thanks!

This is really not answerable as is, and also has very little to do with the main question in this thread. I suggest that
you start an new post,

where you give more detail/context, such as what is fta, what is sigma, and what do they have to do with each other.

If I told you I set x to 3 at work today, and ask you if 3 was the right choice, how on earth could you answer that?
1 like
Comment

Bianca Mulaney

Join Date: Feb 2020
Posts: 1

#10

09 Feb 2020, 21:46

Hello,

Thank you for the helpful explanation above. I am trying to manually calculate the SE of prediction, but I have categorical variables in my logistic regression. How would I manually calculate the standard errors in this case?

The regression is

Code:

. logit DIED i.AGE_cat i.Flail_Chest i.Rib_Plating, or

Iteration 0:   log likelihood = -391.50125  
Iteration 1:   log likelihood = -354.43286  
Iteration 2:   log likelihood = -345.31097  
Iteration 3:   log likelihood = -344.97545  
Iteration 4:   log likelihood = -344.97328  
Iteration 5:   log likelihood = -344.97328  

Logistic regression                             Number of obs     =      1,794
                                                LR chi2(4)        =      93.06
                                                Prob > chi2       =     0.0000
Log likelihood = -344.97328                     Pseudo R2         =     0.1188

------------------------------------------------------------------------------
        DIED | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     AGE_cat |
      41-64  |    2.00887   .7907155     1.77   0.076     .9287707    4.345052
        65+  |   3.624844   1.424458     3.28   0.001        1.678    7.830451
             |
 Flail_Chest |
        Yes  |    2.77998   .5941485     4.78   0.000     1.828603    4.226335
             |
 Rib_Plating |
        Yes  |   .0938063   .0434263    -5.11   0.000     .0378599    .2324255
       _cons |   .0231882   .0088192    -9.90   0.000     .0110034    .0488659
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.

However, because I am using categorical variables, one of the subgroups is the reference group, so my v-cov matrix looks like:

Code:

. mat list V

symmetric V[8,8]
                          DIED:        DIED:        DIED:        DIED:        DIED:        DIED:        DIED:        DIED:
                            0b.           1.           2.          0b.           1.          0b.           1.            
                       AGE_cat      AGE_cat      AGE_cat  Flail_Chest  Flail_Chest  Rib_Plating  Rib_Plating        _cons
  DIED:0b.AGE_cat            0
   DIED:1.AGE_cat            0    .15493051
   DIED:2.AGE_cat            0     .1306595    .15442612
DIED:0b.Flail_C~t            0            0            0            0
DIED:1.Flail_Ch~t            0   -.00217427   -.00220936            0    .04567796
DIED:0b.Rib_Pla~g            0            0            0            0            0            0
DIED:1.Rib_Plat~g            0   -.00365786   -.00189374            0   -.00202697            0    .21430981
       DIED:_cons            0   -.12916846   -.12922332            0    -.0250651            0   -.00789575    .14465232

(For context: I am using the "predict" function to estimate the probability of outcome (death) for each subgroup (e.g., 1.AGE_cat, Flail_Chest "Yes", Rib_Plating "Yes", 1.AGE_cat, Flail_Chest "Yes", Rib_Plating "No", and so on). I want to calculate the standard error for those probabilities. "Predict" has the "stdp"command but that seems to be the standard error of the linear prediction only, and doesn't take into account covariate patterns. Any advice on how to do this would be much appreciated.)

Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10213

#11

10 Feb 2020, 08:14

You can work from the logit coefficients and variances, noting

$$ \text{Odds ratio S.E.} = \sqrt{(e^{\text{coefficient}})^2 \cdot Var(coefficient)}.$$

Here is an example:

Code:

webuse lbw
logit low age i.race,or
logit low age i.race
mat l e(V)
*CALCULATE OR s.e.
di sqrt(exp(_b[age])^2 *(_se[age])^2)
*OR (STATA 16+)
di sqrt(exp(_b[age])^2 *(e(V)[1,1]))

Res.:

Code:

. 
. logit low age i.race,or

Iteration 0:   log likelihood =   -117.336  
Iteration 1:   log likelihood =  -114.0882  
Iteration 2:   log likelihood = -114.06376  
Iteration 3:   log likelihood = -114.06375  

Logistic regression                             Number of obs     =        189
                                                LR chi2(3)        =       6.54
                                                Prob > chi2       =     0.0879
Log likelihood = -114.06375                     Pseudo R2         =     0.0279

------------------------------------------------------------------------------
         low | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .9612592   .0311206    -1.22   0.222     .9021588    1.024231
             |
        race |
      black  |   2.106974   .9932407     1.58   0.114     .8363679    5.307877
      other  |   1.767748   .6229325     1.62   0.106     .8860685    3.526738
             |
       _cons |   .8121906   .6515964    -0.26   0.795     .1685638    3.913377
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.

. 
. logit low age i.race

Iteration 0:   log likelihood =   -117.336  
Iteration 1:   log likelihood =  -114.0882  
Iteration 2:   log likelihood = -114.06376  
Iteration 3:   log likelihood = -114.06375  

Logistic regression                             Number of obs     =        189
                                                LR chi2(3)        =       6.54
                                                Prob > chi2       =     0.0879
Log likelihood = -114.06375                     Pseudo R2         =     0.0279

------------------------------------------------------------------------------
         low |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |  -.0395112   .0323748    -1.22   0.222    -.1029647    .0239423
             |
        race |
      black  |   .7452527   .4714063     1.58   0.114    -.1786867    1.669192
      other  |   .5697062   .3523877     1.62   0.106     -.120961    1.260373
             |
       _cons |  -.2080202   .8022702    -0.26   0.795    -1.780441    1.364401
------------------------------------------------------------------------------

. 
. mat l e(V)

symmetric e(V)[5,5]
                    low:        low:        low:        low:        low:
                                 1b.          2.          3.            
                    age        race        race        race       _cons
    low:age   .00104813
low:1b.race           0           0
 low:2.race   .00241429           0   .22222389
 low:3.race   .00153645           0   .06118996   .12417708
  low:_cons  -.02478287           0  -.11473635  -.09397986   .64363754

. 
. *CALCULATE OR s.e.

. 
. di sqrt(exp(_b[age])^2 *(_se[age])^2)
.03112062

. 
. *OR (STATA 16+)

. 
. di sqrt(exp(_b[age])^2 *(e(V)[1,1]))
.03112062

Last edited by Andrew Musau; 10 Feb 2020, 08:19.

Comment

Matthew Alexander

Join Date: Feb 2021

Posts: 58
#12

03 Aug 2021, 10:22

If I may, this discussion provides the fullest explanation I have come across with respect to manually calculating covariances. I was hoping you could perhaps elaborate slightly on the above?

In particular, I would like to know how to manually calculate the covariance of two or more sets of predicted probabilities in such a way as to replicate the e(V) matrix after margins, post.

As shown below, I can generate predictions with standard error and variance via predictnl. But I don't know how to use this information to calculate covariances...

Code:

use auto, clear logit foreign price replace price = 1000 predictnl double pr1 = predict(pr) if e(sample), se(se1) variance(vc1) replace price = 2000 if e(sample) predictnl double pr2 = predict(pr) if e(sample), se(se2) variance(vc2)

Any input would be hugely appreciated.

Last edited by Matthew Alexander; 03 Aug 2021, 10:28.
Comment

Announcement