Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Which tests are used to calculate the individual P-values in the STATA logit command (binary logistic regression)?

    This picture is from page 1290 in the STATA manual:
    Click image for larger version

Name:	Screenshot 2021-08-03 101043.png
Views:	1
Size:	87.7 KB
ID:	1621772



    The model can be made with the following code in STATA:

    use https://www.stata-press.com/data/r16/auto

    keep make mpg weight foreign

    logit foreign weight mpg

    As I understand the overall fit for the model is calculated with a chi2 test (Prob > chi2 = 0.0000), but how are the individual P-values (P>|z|) calculated?

  • #2
    As an example:

    Code:
    sysuse auto, clear
    logit foreign weight mpg
    matrix list r(table)
    di (1-normal(1.8341116)) * 2
    help normal()
    Best wishes

    Stata 18.0 MP | ORCID | Google Scholar

    Comment


    • #3
      Also see this Stata tip: https://www.stata-journal.com/articl...article=st0137

      The name of those tests of individual parameters is the Wald test.
      ---------------------------------
      Maarten L. Buis
      University of Konstanz
      Department of history and sociology
      box 40
      78457 Konstanz
      Germany
      http://www.maartenbuis.nl
      ---------------------------------

      Comment


      • #4
        Maarten Buis Thank you. Just to be clear - the p-value marked with a red arrow in the image is calculated with a Wald test? If so does it matter if the parameter is binary or continuous?

        Click image for larger version

Name:	InkedScreenshot 2021-08-03 101043_LI.png
Views:	1
Size:	240.6 KB
ID:	1621781

        Comment


        • #5
          Felix Bittmann Thank you for the reply but I'm not sure that I understand it.

          Comment


          • #6
            True, the p-value you marked with the red arrow is the p-value of a Wald test of the null hypothesis that the coefficient of mpg equals 0.

            The parameter is never binary, but I assume you mean whether the explanatory variable is binary or continuous. The answer is no, that makes no difference.
            ---------------------------------
            Maarten L. Buis
            University of Konstanz
            Department of history and sociology
            box 40
            78457 Konstanz
            Germany
            http://www.maartenbuis.nl
            ---------------------------------

            Comment


            • #7
              Mads: did you execute Felix's code? That is often the best way to learn. Don't be afraid of typing something wrong in Stata; the worst that can happen is that Stata returns an error message. Stata will not turn all cute kittens into brain eating zombie tigers when you type something wrong.
              ---------------------------------
              Maarten L. Buis
              University of Konstanz
              Department of history and sociology
              box 40
              78457 Konstanz
              Germany
              http://www.maartenbuis.nl
              ---------------------------------

              Comment


              • #8
                Maarten Buis exactly, thank you very much.

                Comment


                • #9
                  Originally posted by Maarten Buis View Post
                  Mads: did you execute Felix's code? That is often the best way to learn. Don't be afraid of typing something wrong in Stata; the worst that can happen is that Stata returns an error message. Stata will not turn all cute kittens into brain eating zombie tigers when you type something wrong.
                  Haha fair - I did try to run the code but I don't know what I should do with the result.

                  Comment


                  • #10
                    Code:
                    . sysuse auto, clear
                    (1978 automobile data)
                    Opens the example dataset

                    Code:
                    . logit foreign weight mpg
                    
                    Iteration 0:   log likelihood =  -45.03321  
                    Iteration 1:   log likelihood = -29.238536  
                    Iteration 2:   log likelihood = -27.244139  
                    Iteration 3:   log likelihood = -27.175277  
                    Iteration 4:   log likelihood = -27.175156  
                    Iteration 5:   log likelihood = -27.175156  
                    
                    Logistic regression                                     Number of obs =     74
                                                                            LR chi2(2)    =  35.72
                                                                            Prob > chi2   = 0.0000
                    Log likelihood = -27.175156                             Pseudo R2     = 0.3966
                    
                    ------------------------------------------------------------------------------
                         foreign | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
                    -------------+----------------------------------------------------------------
                          weight |  -.0039067   .0010116    -3.86   0.000    -.0058894    -.001924
                             mpg |  -.1685869   .0919175    -1.83   0.067    -.3487418     .011568
                           _cons |   13.70837   4.518709     3.03   0.002     4.851859    22.56487
                    ------------------------------------------------------------------------------
                    Estimates the model

                    Code:
                    . matrix list r(table)
                    
                    r(table)[9,3]
                               foreign:    foreign:    foreign:
                                weight         mpg       _cons
                         b   -.0039067   -.1685869   13.708367
                        se   .00101161   .09191747   4.5187094
                         z  -3.8618465  -1.8341116   3.0336907
                    pvalue   .00011253   .06663742   .00241582
                        ll  -.00588943  -.34874183   4.8518593
                        ul  -.00192397   .01156803   22.564875
                        df           .           .           .
                      crit    1.959964    1.959964    1.959964
                     eform           0           0           0
                    This displays a matrix stored by the logit command called r(table). Notice that the absolute value of the z-value for the coefficient of mpg is 1.8341116 (3rd row, 2nd column)

                    Code:
                    . di (1-normal(1.8341116)) * 2
                    .06663743
                    This shows how to transform that absolute value of the z-value (this is the test-statistic for the Wald test) to the p-value. Notice that the p-value we just computed is almost the same as the p-value in r(table)

                    Code:
                    . help normal()
                    This shows the help-file for the normal() function

                    You asked "but how are the individual P-values (P>|z|) calculated", so Felix showed how that was done.

                    As a minor note: Stata probably does not do (1-normal(abs(z-value))), but probably normal(-abs(z-value)). Mathematically they are equivalent (the normal distribution is symmetric), but the latter is easier for computers. If we do that we get exactly the same p-value as in r(table)

                    Code:
                    . di normal(-abs(_b[mpg]/_se[mpg])) * 2
                    .06663742
                    Last edited by Maarten Buis; 03 Aug 2021, 09:15.
                    ---------------------------------
                    Maarten L. Buis
                    University of Konstanz
                    Department of history and sociology
                    box 40
                    78457 Konstanz
                    Germany
                    http://www.maartenbuis.nl
                    ---------------------------------

                    Comment


                    • #11
                      Thanks to Maarten for explaining my code in detail. I am sorry I thought this was more about the technical aspects of how Stata does this. What I read from your posts is that it might be beneficial for you to check out some textbooks about the theory of what a p-value means and how it relates to standard errors and normal theory. I am not sure which version is the best explanation but it is quite easy to find some information. For example:

                      https://online.stat.psu.edu/stat501/lesson/2/2.12
                      https://www.youtube.com/watch?v=KLnGOL_AUgA
                      Best wishes

                      Stata 18.0 MP | ORCID | Google Scholar

                      Comment


                      • #12
                        For good measure, z = b/se. For mpg,

                        Code:
                        . display -.1685869/ .09191747
                        -1.8341116
                        -------------------------------------------
                        Richard Williams, Notre Dame Dept of Sociology
                        StataNow Version: 19.5 MP (2 processor)

                        EMAIL: [email protected]
                        WWW: https://www3.nd.edu/~rwilliam

                        Comment


                        • #13
                          When my daughter was in high school statistics, I tried to explain to her how t values and p values were related. It was one of the most painful experiences in either of our lives.
                          -------------------------------------------
                          Richard Williams, Notre Dame Dept of Sociology
                          StataNow Version: 19.5 MP (2 processor)

                          EMAIL: [email protected]
                          WWW: https://www3.nd.edu/~rwilliam

                          Comment


                          • #14
                            Maarten Buis Felix Bittmann Richard Williams , thank you for the answers! My question was indeed "just" which test was used, but some extra information is good.

                            Comment

                            Working...
                            X