Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • RE: Calculating R^2

    Hi!

    After the logistic regression, I wanted to directly calculate R^2 (= SSreg/SStotal) without using the fitstat command. Can you please provide stata codes for me to calculate R^2 after running the logistic regression? Thank you in advance.
    Last edited by DY Kim; 05 Sep 2020, 22:41.

  • #2
    "Pseudo R2 – This is the pseudo R-squared. Logistic regression does not have an equivalent to the R-squared that is found in OLS regression; however, many people have tried to come up with one. There are a wide variety of pseudo-R-square statistics. Because this statistic does not mean what R-square means in OLS regression (the proportion of variance explained by the predictors), we suggest interpreting this statistic with great caution."
    https://stats.idre.ucla.edu/stata/ou...sion-analysis/

    Comment


    • #3
      Menard, S. (2001). Applied logistic regression analysis. T)housand Oaks, CA: Sage.

      To calculate standardized coefficients in logit models, Menard suggested the equation: b*YX=(byx)(sx)(R)/slogit(yhat).

      Can you please tell me how to calculate R in Menard’s equation? Which R should I use among many Pseudo R^2s?
      Last edited by DY Kim; 05 Sep 2020, 23:07.

      Comment


      • #4
        I do not know how the quantities you refer in your equation are defined, some such as Byx and Sx I can guess, the others I cannot. And I do not have the book you are referring to.

        What Stata calculates is known in econometrics as the McFadden's pseudo-R squared (I think), and is calculated as

        Pseudo R-squared = 1 - (Log likelihood under the full model)/(Log likelihood under the model including only a constant). E.g., here

        Code:
        . sysuse  auto
        (1978 Automobile Data)
        
        . logistic foreign mpg headroom
        
        Logistic regression                             Number of obs     =         74
                                                        LR chi2(2)        =      13.39
                                                        Prob > chi2       =     0.0012
        Log likelihood =  -38.34058                     Pseudo R2         =     0.1486
        
        ------------------------------------------------------------------------------
             foreign | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                 mpg |   1.139888   .0623668     2.39   0.017     1.023977    1.268919
            headroom |    .598804    .227678    -1.35   0.177     .2842102    1.261623
               _cons |   .1017957   .1919437    -1.21   0.226     .0025277    4.099552
        ------------------------------------------------------------------------------
        Note: _cons estimates baseline odds.
        
        . dis 1 - e(ll)/e(ll_0)
        .14861542
        Originally posted by DY Kim View Post
        Menard, S. (2001). Applied logistic regression analysis. T)housand Oaks, CA: Sage.

        To calculate standardized coefficients in logit models, Menard suggested the equation: b*YX=(byx)(sx)(R)/slogit(yhat).

        Can you please tell me how to calculate R in Menard’s equation? Which R should I use among many Pseudo R^2s?

        Comment


        • #5
          Sums of squared anythings have essentially no application to logit or logistic regression because it is not based on that machinery.

          This is dangerous territory. Many serious researchers regard R-square as likely to serve as a snare and a distraction even on home ground, namely linear regression, let alone outside it.

          Some possibilities and some warnings are bundled together at https://www.stata.com/support/faqs/s...ics/r-squared/

          On pseudo R-squared I am reminded of the quip that the great thing about standards is that there are so many to choose from.

          Like Joro Kolev I don't have Menard's book and cannot guess what was intended there.

          Comment


          • #6
            What Nick says below is the current prevalent opinion in econometrics as well. For about 20 years now Professor Wooldridge in his various textbooks has been saying that "the only interesting thing about the R-squared is that it is not interesting at all" (or something to that end), and most young econometricians became econometricians reading Professor Wooldridge's textbooks.

            Cannot agree more with Nick that having various statistical measures of anything (pseudo R-squared here) that we can pick up from (potentially to please Reviewer 2), is the highway to hell of bad statistics.

            I can only add that this particular thing that Stata calculates after logistic regression (McFadden's pseudo-R squared) is very well justified in this article
            Magee, L. (1990). R 2 measures based on Wald and likelihood ratio joint significance tests. The American Statistician, 44(3), 250-253.



            Originally posted by Nick Cox View Post
            Sums of squared anythings have essentially no application to logit or logistic regression because it is not based on that machinery.

            This is dangerous territory. Many serious researchers regard R-square as likely to serve as a snare and a distraction even on home ground, namely linear regression, let alone outside it.

            Some possibilities and some warnings are bundled together at https://www.stata.com/support/faqs/s...ics/r-squared/

            On pseudo R-squared I am reminded of the quip that the great thing about standards is that there are so many to choose from.

            Like Joro Kolev I don't have Menard's book and cannot guess what was intended there.

            Comment


            • #7
              A.N. Whitehead's dictum "Seek simplicity and distrust it" serves as a wise if banal summary of a large literature.

              Similarly, perhaps, "glance at R-square and distrust it" may satisfy many as advice that looks both ways. In some physical sciences R-square even of 0.9 might indicate say poor experimental technique, poor theory, or both. In many social sciences high R-square is intrinsically implausible because people's attitudes and behaviour just aren't that predictable. High R-square might even mean a silly question, invented data, or something else unsatisfactory.

              Comment

              Working...
              X