RE: Calculating R^2

DY Kim

Join Date: Aug 2020

Posts: 70
#1

RE: Calculating R^2

05 Sep 2020, 22:35

Hi!

After the logistic regression, I wanted to directly calculate R^2 (= SS_reg/SS_total) without using the fitstat command. Can you please provide stata codes for me to calculate R^2 after running the logistic regression? Thank you in advance.

Last edited by DY Kim; 05 Sep 2020, 22:41.
Tags: None
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#2

05 Sep 2020, 22:44

"Pseudo R2 – This is the pseudo R-squared. Logistic regression does not have an equivalent to the R-squared that is found in OLS regression; however, many people have tried to come up with one. There are a wide variety of pseudo-R-square statistics. Because this statistic does not mean what R-square means in OLS regression (the proportion of variance explained by the predictors), we suggest interpreting this statistic with great caution."
https://stats.idre.ucla.edu/stata/ou...sion-analysis/
1 like
Comment
DY Kim

Join Date: Aug 2020

Posts: 70
#3

05 Sep 2020, 23:05

Menard, S. (2001). Applied logistic regression analysis. T)housand Oaks, CA: Sage.

To calculate standardized coefficients in logit models, Menard suggested the equation: b*_YX=(b_yx)(s_x)(R)/s_logit(yhat).

Can you please tell me how to calculate R in Menard’s equation? Which R should I use among many Pseudo R^2s?

Last edited by DY Kim; 05 Sep 2020, 23:07.
Comment

Joro Kolev

Join Date: Aug 2018
Posts: 3050

05 Sep 2020, 23:36

I do not know how the quantities you refer in your equation are defined, some such as Byx and Sx I can guess, the others I cannot. And I do not have the book you are referring to.

What Stata calculates is known in econometrics as the McFadden's pseudo-R squared (I think), and is calculated as

Pseudo R-squared = 1 - (Log likelihood under the full model)/(Log likelihood under the model including only a constant). E.g., here

Code:

. sysuse  auto
(1978 Automobile Data)

. logistic foreign mpg headroom

Logistic regression                             Number of obs     =         74
                                                LR chi2(2)        =      13.39
                                                Prob > chi2       =     0.0012
Log likelihood =  -38.34058                     Pseudo R2         =     0.1486

------------------------------------------------------------------------------
     foreign | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |   1.139888   .0623668     2.39   0.017     1.023977    1.268919
    headroom |    .598804    .227678    -1.35   0.177     .2842102    1.261623
       _cons |   .1017957   .1919437    -1.21   0.226     .0025277    4.099552
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.

. dis 1 - e(ll)/e(ll_0)
.14861542

Originally posted by DY Kim View Post

Menard, S. (2001). Applied logistic regression analysis. T)housand Oaks, CA: Sage.

To calculate standardized coefficients in logit models, Menard suggested the equation: b*_YX=(b_yx)(s_x)(R)/s_logit(yhat).

Can you please tell me how to calculate R in Menard’s equation? Which R should I use among many Pseudo R^2s?

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35703
#5

06 Sep 2020, 04:36

Sums of squared anythings have essentially no application to logit or logistic regression because it is not based on that machinery.

This is dangerous territory. Many serious researchers regard R-square as likely to serve as a snare and a distraction even on home ground, namely linear regression, let alone outside it.

Some possibilities and some warnings are bundled together at https://www.stata.com/support/faqs/s...ics/r-squared/

On pseudo R-squared I am reminded of the quip that the great thing about standards is that there are so many to choose from.

Like Joro Kolev I don't have Menard's book and cannot guess what was intended there.
2 likes
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#6

06 Sep 2020, 05:18

What Nick says below is the current prevalent opinion in econometrics as well. For about 20 years now Professor Wooldridge in his various textbooks has been saying that "the only interesting thing about the R-squared is that it is not interesting at all" (or something to that end), and most young econometricians became econometricians reading Professor Wooldridge's textbooks.

Cannot agree more with Nick that having various statistical measures of anything (pseudo R-squared here) that we can pick up from (potentially to please Reviewer 2), is the highway to hell of bad statistics.

I can only add that this particular thing that Stata calculates after logistic regression (McFadden's pseudo-R squared) is very well justified in this article
Magee, L. (1990). R 2 measures based on Wald and likelihood ratio joint significance tests. The American Statistician, 44(3), 250-253.

Originally posted by Nick Cox View Post

Sums of squared anythings have essentially no application to logit or logistic regression because it is not based on that machinery.

This is dangerous territory. Many serious researchers regard R-square as likely to serve as a snare and a distraction even on home ground, namely linear regression, let alone outside it.

Some possibilities and some warnings are bundled together at https://www.stata.com/support/faqs/s...ics/r-squared/

On pseudo R-squared I am reminded of the quip that the great thing about standards is that there are so many to choose from.

Like Joro Kolev I don't have Menard's book and cannot guess what was intended there.
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35703
#7

06 Sep 2020, 06:11

A.N. Whitehead's dictum "Seek simplicity and distrust it" serves as a wise if banal summary of a large literature.

Similarly, perhaps, "glance at R-square and distrust it" may satisfy many as advice that looks both ways. In some physical sciences R-square even of 0.9 might indicate say poor experimental technique, poor theory, or both. In many social sciences high R-square is intrinsically implausible because people's attitudes and behaviour just aren't that predictable. High R-square might even mean a silly question, invented data, or something else unsatisfactory.
2 likes
Comment

Announcement