Can I get a pseudo r-squared in SVY logistic ?

Andrew Kenny

Join Date: Sep 2017

Posts: 27
#1

Can I get a pseudo r-squared in SVY logistic ?

21 Dec 2017, 19:44

It seems that the standard way to use the data that I am using is to use it in weighted fashion (using SVY) rather than unweighted.

However, it seems that, while logistic regression produces a pseudo r-squared statistic , SVY logisitc does not. Am I mistaken?

Any suggestions for how to handle this?

As a side note, I gather that there are differing views regarding how useful pseudo r-squared is.

However, it is relevant to note that the journal I am aiming to publish in is read primarily by non-statisticians. The focus of my study is more on the coefficients than the r-squared and I would think that readers of the journal I am aiming for will think similarly.

If there is a reason that r-squared is not reported by SVY logistic, might it be quite standard (eg acceptable in many journals) to just let the r-squared go unreported?

Andy
Tags: None
Richard Williams

Join Date: Apr 2014

Posts: 4984
#2

22 Dec 2017, 05:53

Pseudo R^2 is computed using log likelihoods, and log likelihoods assume that cases are all independent of each other. When you have clustering and the like, cases are not independent, so pseudo R^2 is not considered appropriate. (That is also why you suddenly start getting Wald chi-squares or F values instead of LR chi-squares when you use the cluster option or svy: prefix. This struck me as really bizarre at first until I more or less understood it.)

If you are bound and determined to report Pseudo R^2 anyway, I think you could do something like

logit y x [pw=wgt]

For more on what you can and ca't do with svy and what you can do instead, see

https://www3.nd.edu/~rwilliam/stats3/SvyCautionsX.pdf

This was previously discussed in

https://www.stata.com/statalist/arch.../msg00366.html

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
2 likes
Comment
Joseph Luchman

Join Date: Mar 2014

Posts: 114
#3

22 Dec 2017, 07:36

Hi Andrew and Richard,

The clustering argument can also be applied to multilevel/mixed models where there are some R²metrics growing in acceptance (e.g., LaHuis, Hartman, Hakoyama, & Clark, 2014). This is to say that there are ways to get R²s for clustered-data designs.

Survey models have the advantage building the aspects which affect the log-likelihoods (for simpler models like -logit- at least) into the survey weights. Thus, for many models, the pseudo-R² can be obtained as Richard notes - with the -pweight-s alone and the non-svy-prefixed command.

I discuss a related issue (with simulation as demonstration) here (e.g., Luchman, 2015). I am sure there are applications where more than merely the -pweight-s must be used, but for many standard -svy- models (-regress-, -logit-, -ologit-, -poisson-) this logic should apply.

Interested in hearing other counterpoints or cautions on this issue - but it seems that when used for primarily descriptive purposes (as is usually the case; not estimating sampling variances of pseudo-R²), using the -pweight-s alone (without -svy- prefix) would seem to be fine and should not be frowned upon.

- joe

LaHuis, D. M., Hartman, M. J., Hakoyama, S., & Clark, P. C. (2014). Explained variance measures for multilevel models. Organizational Research Methods, 17(4), 433-451.

Luchman, J. N. (2015). Determining subgroup difference importance with complex survey designs: An application of weighted dominance analysis. Survey Practice, 8(5).

Joseph Nicholas Luchman, Ph.D., PStat® (American Statistical Association)
----
Research Fellow
Fors Marsh
----
Version 18.0 MP
1 like
Comment
David Radwin

Join Date: Mar 2014

Posts: 368
#4

28 Dec 2017, 17:39

For an alternate goodness-of-fit measure for logistic regression using complex survey data, see

Archer, K. J., & Lemeshow, S. (2006). Goodness-of-Fit Test for a Logistic Regression Model Fitted Using Survey Sample Data. Stata Journal, 6(1), 97-105.

David Radwin
Senior Researcher, California Competes
californiacompetes.org
Pronouns: He/Him
1 like
Comment
Christiana Botticello

Join Date: Aug 2015

Posts: 7
#5

13 Aug 2018, 21:16

Thank you David Radwin for sharing this article. Is this ado one of the most common GOF tests for SVY: logit models? It seems that when evaluating nested models, the wald test that Richard Williams proposes and explains in https://www3.nd.edu/~rwilliam/stats2/SvyCautions.pdf is still the standard. Nevertheless, if one wants to display a statistic for several nested models to allow for overall comparison much like the R-squared, pseudo-R-squared, percent correctly predicted, or the AIC and BIC are often used, it seems that there is no equivalent for SVY: logit models.

I write this based on my reading of this thread as well as the following threads:
cited above: https://www.stata.com/statalist/archive/2007-09/msg00366.html

another archived discussion of post-estimation for SVY: logit: https://www.stata.com/statalist/arch.../msg00689.html

and a more recent discussion of comparing nested SVY: logit models: https://www.statalist.org/forums/forum/general-stata-discussion/general/293254-how-to-use-stata-for-comparing-nested-models-with-survey-design

It seems that to get a summary statistic of goodness of fit, the recommendation is to run the model without accounting for the survey design but retaining pweights and then to get typical GOF statistics that assume i.i.d. and report them based on what David wrote above:

Survey models have the advantage building the aspects which affect the log-likelihoods (for simpler models like -logit- at least) into the survey weights. Thus, for many models, the pseudo-R² can be obtained as Richard notes - with the -pweight-s alone and the non-svy-prefixed command.

How can I discern whether this is a valid approach for a particular model? Is it a matter of whether it is a "simple" model like logit or does it have to do with the survey design?
Comment
Natalia Levenko

Join Date: Jun 2018

Posts: 3
#6

04 Oct 2021, 03:12

Hi, Richard,
Thank you for the explanations, just a follow-up question. If I use survey weights (svy linearized: ) but I have no clusters in my data. Can I claim that my observations are independent and in principple pseudo R^2 is an appropriate measure of gof?
Comment

Announcement

Can I get a pseudo r-squared in SVY logistic ?

Comment

Comment

Comment

Comment

Comment