Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Can I get a pseudo r-squared in SVY logistic ?

    It seems that the standard way to use the data that I am using is to use it in weighted fashion (using SVY) rather than unweighted.

    However, it seems that, while logistic regression produces a pseudo r-squared statistic , SVY logisitc does not. Am I mistaken?

    Any suggestions for how to handle this?

    As a side note, I gather that there are differing views regarding how useful pseudo r-squared is.

    However, it is relevant to note that the journal I am aiming to publish in is read primarily by non-statisticians. The focus of my study is more on the coefficients than the r-squared and I would think that readers of the journal I am aiming for will think similarly.

    If there is a reason that r-squared is not reported by SVY logistic, might it be quite standard (eg acceptable in many journals) to just let the r-squared go unreported?

    Andy





  • #2
    Pseudo R^2 is computed using log likelihoods, and log likelihoods assume that cases are all independent of each other. When you have clustering and the like, cases are not independent, so pseudo R^2 is not considered appropriate. (That is also why you suddenly start getting Wald chi-squares or F values instead of LR chi-squares when you use the cluster option or svy: prefix. This struck me as really bizarre at first until I more or less understood it.)

    If you are bound and determined to report Pseudo R^2 anyway, I think you could do something like

    logit y x [pw=wgt]

    For more on what you can and ca't do with svy and what you can do instead, see

    https://www3.nd.edu/~rwilliam/stats3/SvyCautionsX.pdf

    This was previously discussed in

    https://www.stata.com/statalist/arch.../msg00366.html
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    StataNow Version: 19.5 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      Hi Andrew and Richard,

      The clustering argument can also be applied to multilevel/mixed models where there are some R2 metrics growing in acceptance (e.g., LaHuis, Hartman, Hakoyama, & Clark, 2014). This is to say that there are ways to get R2 s for clustered-data designs.

      Survey models have the advantage building the aspects which affect the log-likelihoods (for simpler models like -logit- at least) into the survey weights. Thus, for many models, the pseudo-R2 can be obtained as Richard notes - with the -pweight-s alone and the non-svy-prefixed command.

      I discuss a related issue (with simulation as demonstration) here (e.g., Luchman, 2015). I am sure there are applications where more than merely the -pweight-s must be used, but for many standard -svy- models (-regress-, -logit-, -ologit-, -poisson-) this logic should apply.

      Interested in hearing other counterpoints or cautions on this issue - but it seems that when used for primarily descriptive purposes (as is usually the case; not estimating sampling variances of pseudo-R2), using the -pweight-s alone (without -svy- prefix) would seem to be fine and should not be frowned upon.

      - joe

      LaHuis, D. M., Hartman, M. J., Hakoyama, S., & Clark, P. C. (2014). Explained variance measures for multilevel models. Organizational Research Methods, 17(4), 433-451.

      Luchman, J. N. (2015). Determining subgroup difference importance with complex survey designs: An application of weighted dominance analysis. Survey Practice, 8(5).

      Joseph Nicholas Luchman, Ph.D., PStatĀ® (American Statistical Association)
      ----
      Research Fellow
      Fors Marsh

      ----
      Version 18.0 MP

      Comment


      • #4
        For an alternate goodness-of-fit measure for logistic regression using complex survey data, see

        Archer, K. J., & Lemeshow, S. (2006). Goodness-of-Fit Test for a Logistic Regression Model Fitted Using Survey Sample Data. Stata Journal, 6(1), 97-105.
        David Radwin
        Senior Researcher, California Competes
        californiacompetes.org
        Pronouns: He/Him

        Comment


        • #5
          Thank you David Radwin for sharing this article. Is this ado one of the most common GOF tests for SVY: logit models? It seems that when evaluating nested models, the wald test that Richard Williams proposes and explains in https://www3.nd.edu/~rwilliam/stats2/SvyCautions.pdf is still the standard. Nevertheless, if one wants to display a statistic for several nested models to allow for overall comparison much like the R-squared, pseudo-R-squared, percent correctly predicted, or the AIC and BIC are often used, it seems that there is no equivalent for SVY: logit models.

          I write this based on my reading of this thread as well as the following threads:
          cited above: https://www.stata.com/statalist/archive/2007-09/msg00366.html

          another archived discussion of post-estimation for SVY: logit: https://www.stata.com/statalist/arch.../msg00689.html

          and a more recent discussion of comparing nested SVY: logit models: https://www.statalist.org/forums/forum/general-stata-discussion/general/293254-how-to-use-stata-for-comparing-nested-models-with-survey-design

          It seems that to get a summary statistic of goodness of fit, the recommendation is to run the model without accounting for the survey design but retaining pweights and then to get typical GOF statistics that assume i.i.d. and report them based on what David wrote above:
          Survey models have the advantage building the aspects which affect the log-likelihoods (for simpler models like -logit- at least) into the survey weights. Thus, for many models, the pseudo-R2 can be obtained as Richard notes - with the -pweight-s alone and the non-svy-prefixed command.
          How can I discern whether this is a valid approach for a particular model? Is it a matter of whether it is a "simple" model like logit or does it have to do with the survey design?

          Comment


          • #6
            Hi, Richard,
            Thank you for the explanations, just a follow-up question. If I use survey weights (svy linearized: ) but I have no clusters in my data. Can I claim that my observations are independent and in principple pseudo R^2 is an appropriate measure of gof?

            Comment

            Working...
            X