Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to manually calculate the SE of prediction for each observation

    Dear Statalists,

    The following syntax gives us the standard error of prediction for each observation.

    Code:
    ologit y x1
    predict stdp, stdp
    But how can we replicate this stdp by hand?

    For example, if the estimated results are as follows:

    y Coef. Std.Err. z P>|z| 95%CI
    x1 0.501 0.012 40.99 0.000 0.477 0.525
    cut1 2.635 0.090
    cut2 4.482 0.092
    cut3 5.378 0.094
    cut4 6.219 0.097

    We can replicate

    Code:
    predict xb, xb
    by typing

    Code:
    gen xb_replicate = 0.501 * x1
    Can we do such a thing with stdp? I mean, can we calculate manually stdp for each case?

    Thank you in advance,
    DS
    Last edited by David Silverstein; 28 Apr 2015, 05:29.

  • #2
    Code:
    . webuse fullauto,clear
    (Automobile Models)
    
    . qui ologit rep77 mpg
    
    . predict stdp , stdp
    
    . keep rep77 mpg st
    
    . mat V = e(V)
    
    . scalar V = V[1,1]
    
    . gen mystdp = sqrt(mpg*V*mpg)
    
    . l in 1/10
    
         +-------------------------------------+
         | mpg     rep77       stdp     mystdp |
         |-------------------------------------|
      1. |  22      Fair    .827769    .827769 |
      2. |  17      Poor   .6396397   .6396397 |
      3. |  22         .    .827769    .827769 |
      4. |  23   Average   .8653949   .8653949 |
      5. |  17      Fair   .6396397   .6396397 |
         |-------------------------------------|
      6. |  25      Good   .9406466   .9406466 |
      7. |  20   Average   .7525173   .7525173 |
      8. |  15      Good    .564388    .564388 |
      9. |  18      Good   .6772656   .6772656 |
     10. |  26         .   .9782725   .9782725 |
         +-------------------------------------+

    Comment


    • #3
      Dear Scott,

      I really appreciate your helpful answer!!
      Sorry for the late reply because I needed a whole day to understand the syntaxes.
      Could I ask one more question? What if the equation has two or more explanatory variables? Can I do the same procedure with ologit y x1 x2, for example?

      Best,
      DS

      Comment


      • #4
        Why don't you give it a try and see? It takes you back to your school days... how do we calculate variance? Using Scott's example, let us try the 2 variable case


        Code:
        webuse fullauto,clear
        qui ologit rep77 mpg price
        predict stdp , stdp
        keep rep77 mpg price st
        mat V = e(V)
        Before we proceed, let us examine this matrix e(V) to make sure that you understand Scott's code above


        Code:
        . mat list V
        
        symmetric V[6,6]
                         rep77:     rep77:      cut1:      cut2:      cut3:      cut4:
                           mpg      price      _cons      _cons      _cons      _cons
          rep77:mpg  .00182997
        rep77:price  1.723e-06  7.580e-09
         cut1:_cons  .04692561  .00007947  1.7638211
         cut2:_cons  .04793381  .00008122  1.5159417  1.5676695
         cut3:_cons  .05079184  .00008574  1.5500759  1.5896552  1.7202349
         cut4:_cons  .05452492  .00008838  1.6314346  1.6703903  1.7908583  2.0788882
        Now what is Scott doing when he says scalar V= V[1,1] ?
        V is a 6x6 matrix, so he is telling Stata to pick up the diagonal element in the top left hand corner, i.e., .00182997, which is the variance of mpg. Now, with two variables, we have two variances (for mpg, and for price). Now back to your school days:

        Var(aX +bY) = a^2(var X)+ b^2(var Y) + 2ab(cov X,Y) for some random variables X and Y. .The V[2,1] element is the covariance. So implement this logic and voila!


        Code:
        scalar V1 = V[1,1]
        scalar V2 = V[2,2]
        scalar V3 = V[2,1]
        gen var = (2* price* mpg*V3)+ (price*price*V2)+ (mpg*mpg*V1)
        gen mystd= sqrt(var)
        
        . l in 1/10
        
             +--------------------------------------------------------+
             | price   mpg     rep77       stdp        var      mystd |
             |--------------------------------------------------------|
          1. |  4099    22      Fair   1.150601   1.323883   1.150601 |
          2. |  4749    17      Poor   .9889793   .9780801   .9889793 |
          3. |  3799    22         .   1.132773   1.283174   1.132773 |
          4. |  6295    23   Average   1.329461   1.767465   1.329461 |
          5. |  9690    17      Fair   1.344764   1.808389   1.344764 |
             |--------------------------------------------------------|
          6. |  9735    25      Good   1.643457    2.70095   1.643457 |
          7. |  4816    20   Average   1.113458   1.239788   1.113458 |
          8. |  7827    15      Good   1.131716   1.280782   1.131716 |
          9. |  5788    18      Good   1.098155   1.205945   1.098155 |
         10. |  4453    26         .   1.336571   1.786421   1.336571 |
             +--------------------------------------------------------+

        Comment


        • #5
          Dear Andrew,

          Thank you for your reply!

          Allow me to make sure one more thing.
          In the variance-covariance matrix,

          (from my understanding)
          V[1,1] = .00182997 is the square of s.e. for mpg (.0427781)
          V[2,2] = 7.580e-09 is the square of s.e. for price (.0000871)
          In a similar way, can I derive the covariance V[2,1] = 1.723e-06 from the estimated results (ologit rep77 mpg price)?

          Best wishes,
          DS

          Code:
          . ologit rep77 mpg price
          
          Iteration 0:   log likelihood = -89.895098  
          Iteration 1:   log likelihood =  -86.52481  
          Iteration 2:   log likelihood = -86.491281  
          Iteration 3:   log likelihood = -86.491244  
          Iteration 4:   log likelihood = -86.491244  
          
          Ordered logistic regression                       Number of obs   =         66
                                                            LR chi2(2)      =       6.81
                                                            Prob > chi2     =     0.0332
          Log likelihood = -86.491244                       Pseudo R2       =     0.0379
          
          ------------------------------------------------------------------------------
                 rep77 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                   mpg |    .107125   .0427781     2.50   0.012     .0232814    .1909686
                 price |   .0001329   .0000871     1.53   0.127    -.0000378    .0003035
          -------------+----------------------------------------------------------------
                 /cut1 |  -.0656913   1.328089                     -2.668698    2.537316
                 /cut2 |   1.722323   1.252066                     -.7316817    4.176327
                 /cut3 |   3.662575   1.311577                      1.091931    6.233219
                 /cut4 |   5.797839   1.441835                      2.971894    8.623784
          ------------------------------------------------------------------------------

          Comment


          • #6
            Absolutely. Recall Cov(X, Y) =ρX,Y σX,σY (where rho is the correlation coefficient and sigma is the standard deviation)

            Code:
            
            . webuse fullauto,clear
            (Automobile Models)
            
            . ologit rep77 mpg price
            
            Iteration 0:   log likelihood = -89.895098  
            Iteration 1:   log likelihood =  -86.52481  
            Iteration 2:   log likelihood = -86.491281  
            Iteration 3:   log likelihood = -86.491244  
            Iteration 4:   log likelihood = -86.491244  
            
            Ordered logistic regression                       Number of obs   =         66
                                                              LR chi2(2)      =       6.81
                                                              Prob > chi2     =     0.0332
            Log likelihood = -86.491244                       Pseudo R2       =     0.0379
            
            ------------------------------------------------------------------------------
                   rep77 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                     mpg |    .107125   .0427781     2.50   0.012     .0232814    .1909686
                   price |   .0001329   .0000871     1.53   0.127    -.0000378    .0003035
            -------------+----------------------------------------------------------------
                   /cut1 |  -.0656913   1.328089                     -2.668698    2.537316
                   /cut2 |   1.722323   1.252066                     -.7316817    4.176327
                   /cut3 |   3.662575   1.311577                      1.091931    6.233219
                   /cut4 |   5.797839   1.441835                      2.971894    8.623784
            ------------------------------------------------------------------------------
            
            . corr mpg price
            (obs=74)
            
                         |      mpg    price
            -------------+------------------
                     mpg |   1.0000
                   price |  -0.4594   1.0000
            
            
            . scalar cov=  -0.4594*.0427781* .0000871
            
            . di cov
            -1.712e-06
            
            .

            Comment


            • #7
              Thank you again, Andrew! I really appreciate your help.

              Comment


              • #8
                Dear Statalist,

                I am working on my thesis assessing the impact of fta. I see in some do-files that they have command-

                scalar sigma = (followed by a number). How does one decide what the number is and what is this command for?

                Many thanks!

                Comment


                • #9
                  Originally posted by Aishwarya Nahata View Post
                  Dear Statalist,

                  I am working on my thesis assessing the impact of fta. I see in some do-files that they have command-

                  scalar sigma = (followed by a number). How does one decide what the number is and what is this command for?

                  Many thanks!
                  This is really not answerable as is, and also has very little to do with the main question in this thread. I suggest that
                  1. you start an new post,
                  2. where you give more detail/context, such as what is fta, what is sigma, and what do they have to do with each other.
                  If I told you I set x to 3 at work today, and ask you if 3 was the right choice, how on earth could you answer that?

                  Comment


                  • #10
                    Hello,

                    Thank you for the helpful explanation above. I am trying to manually calculate the SE of prediction, but I have categorical variables in my logistic regression. How would I manually calculate the standard errors in this case?

                    The regression is

                    Code:
                    . logit DIED i.AGE_cat i.Flail_Chest i.Rib_Plating, or
                    
                    Iteration 0:   log likelihood = -391.50125  
                    Iteration 1:   log likelihood = -354.43286  
                    Iteration 2:   log likelihood = -345.31097  
                    Iteration 3:   log likelihood = -344.97545  
                    Iteration 4:   log likelihood = -344.97328  
                    Iteration 5:   log likelihood = -344.97328  
                    
                    Logistic regression                             Number of obs     =      1,794
                                                                    LR chi2(4)        =      93.06
                                                                    Prob > chi2       =     0.0000
                    Log likelihood = -344.97328                     Pseudo R2         =     0.1188
                    
                    ------------------------------------------------------------------------------
                            DIED | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                         AGE_cat |
                          41-64  |    2.00887   .7907155     1.77   0.076     .9287707    4.345052
                            65+  |   3.624844   1.424458     3.28   0.001        1.678    7.830451
                                 |
                     Flail_Chest |
                            Yes  |    2.77998   .5941485     4.78   0.000     1.828603    4.226335
                                 |
                     Rib_Plating |
                            Yes  |   .0938063   .0434263    -5.11   0.000     .0378599    .2324255
                           _cons |   .0231882   .0088192    -9.90   0.000     .0110034    .0488659
                    ------------------------------------------------------------------------------
                    Note: _cons estimates baseline odds.

                    However, because I am using categorical variables, one of the subgroups is the reference group, so my v-cov matrix looks like:


                    Code:
                    . mat list V
                    
                    symmetric V[8,8]
                                              DIED:        DIED:        DIED:        DIED:        DIED:        DIED:        DIED:        DIED:
                                                0b.           1.           2.          0b.           1.          0b.           1.            
                                           AGE_cat      AGE_cat      AGE_cat  Flail_Chest  Flail_Chest  Rib_Plating  Rib_Plating        _cons
                      DIED:0b.AGE_cat            0
                       DIED:1.AGE_cat            0    .15493051
                       DIED:2.AGE_cat            0     .1306595    .15442612
                    DIED:0b.Flail_C~t            0            0            0            0
                    DIED:1.Flail_Ch~t            0   -.00217427   -.00220936            0    .04567796
                    DIED:0b.Rib_Pla~g            0            0            0            0            0            0
                    DIED:1.Rib_Plat~g            0   -.00365786   -.00189374            0   -.00202697            0    .21430981
                           DIED:_cons            0   -.12916846   -.12922332            0    -.0250651            0   -.00789575    .14465232
                    (For context: I am using the "predict" function to estimate the probability of outcome (death) for each subgroup (e.g., 1.AGE_cat, Flail_Chest "Yes", Rib_Plating "Yes", 1.AGE_cat, Flail_Chest "Yes", Rib_Plating "No", and so on). I want to calculate the standard error for those probabilities. "Predict" has the "stdp"command but that seems to be the standard error of the linear prediction only, and doesn't take into account covariate patterns. Any advice on how to do this would be much appreciated.)

                    Comment


                    • #11
                      You can work from the logit coefficients and variances, noting

                      $$ \text{Odds ratio S.E.} = \sqrt{(e^{\text{coefficient}})^2 \cdot Var(coefficient)}.$$

                      Here is an example:

                      Code:
                      webuse lbw
                      logit low age i.race,or
                      logit low age i.race
                      mat l e(V)
                      *CALCULATE OR s.e.
                      di sqrt(exp(_b[age])^2 *(_se[age])^2)
                      *OR (STATA 16+)
                      di sqrt(exp(_b[age])^2 *(e(V)[1,1]))
                      Res.:

                      Code:
                      . 
                      . logit low age i.race,or
                      
                      Iteration 0:   log likelihood =   -117.336  
                      Iteration 1:   log likelihood =  -114.0882  
                      Iteration 2:   log likelihood = -114.06376  
                      Iteration 3:   log likelihood = -114.06375  
                      
                      Logistic regression                             Number of obs     =        189
                                                                      LR chi2(3)        =       6.54
                                                                      Prob > chi2       =     0.0879
                      Log likelihood = -114.06375                     Pseudo R2         =     0.0279
                      
                      ------------------------------------------------------------------------------
                               low | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
                      -------------+----------------------------------------------------------------
                               age |   .9612592   .0311206    -1.22   0.222     .9021588    1.024231
                                   |
                              race |
                            black  |   2.106974   .9932407     1.58   0.114     .8363679    5.307877
                            other  |   1.767748   .6229325     1.62   0.106     .8860685    3.526738
                                   |
                             _cons |   .8121906   .6515964    -0.26   0.795     .1685638    3.913377
                      ------------------------------------------------------------------------------
                      Note: _cons estimates baseline odds.
                      
                      . 
                      . logit low age i.race
                      
                      Iteration 0:   log likelihood =   -117.336  
                      Iteration 1:   log likelihood =  -114.0882  
                      Iteration 2:   log likelihood = -114.06376  
                      Iteration 3:   log likelihood = -114.06375  
                      
                      Logistic regression                             Number of obs     =        189
                                                                      LR chi2(3)        =       6.54
                                                                      Prob > chi2       =     0.0879
                      Log likelihood = -114.06375                     Pseudo R2         =     0.0279
                      
                      ------------------------------------------------------------------------------
                               low |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                      -------------+----------------------------------------------------------------
                               age |  -.0395112   .0323748    -1.22   0.222    -.1029647    .0239423
                                   |
                              race |
                            black  |   .7452527   .4714063     1.58   0.114    -.1786867    1.669192
                            other  |   .5697062   .3523877     1.62   0.106     -.120961    1.260373
                                   |
                             _cons |  -.2080202   .8022702    -0.26   0.795    -1.780441    1.364401
                      ------------------------------------------------------------------------------
                      
                      . 
                      . mat l e(V)
                      
                      symmetric e(V)[5,5]
                                          low:        low:        low:        low:        low:
                                                       1b.          2.          3.            
                                          age        race        race        race       _cons
                          low:age   .00104813
                      low:1b.race           0           0
                       low:2.race   .00241429           0   .22222389
                       low:3.race   .00153645           0   .06118996   .12417708
                        low:_cons  -.02478287           0  -.11473635  -.09397986   .64363754
                      
                      . 
                      . *CALCULATE OR s.e.
                      
                      . 
                      . di sqrt(exp(_b[age])^2 *(_se[age])^2)
                      .03112062
                      
                      . 
                      . *OR (STATA 16+)
                      
                      . 
                      . di sqrt(exp(_b[age])^2 *(e(V)[1,1]))
                      .03112062
                      Last edited by Andrew Musau; 10 Feb 2020, 08:19.

                      Comment


                      • #12
                        If I may, this discussion provides the fullest explanation I have come across with respect to manually calculating covariances. I was hoping you could perhaps elaborate slightly on the above?

                        In particular, I would like to know how to manually calculate the covariance of two or more sets of predicted probabilities in such a way as to replicate the e(V) matrix after margins, post.

                        As shown below, I can generate predictions with standard error and variance via predictnl. But I don't know how to use this information to calculate covariances...

                        Code:
                        use auto, clear
                        
                        logit foreign price
                        
                        replace price = 1000
                        predictnl double pr1 = predict(pr) if e(sample), se(se1) variance(vc1)
                        
                        replace price = 2000 if e(sample)
                        predictnl double pr2 = predict(pr) if e(sample), se(se2) variance(vc2)
                        Any input would be hugely appreciated.
                        Last edited by Matthew Alexander; 03 Aug 2021, 10:28.

                        Comment

                        Working...
                        X