Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fit statistics for logistic regressions using svy and subpop

    I wanted to get fit statistics in order to compare models in logistic regression. However, I have not been able to find a Stata command that will work because I am using sample weights (svy) and a subpop analysis. I understand that is not appropriate to compare pseudo-log-likelihoods in the way one would with log likelihoods or to use pseudo-log-likelihoods in a likelihood ratio chi-square test. Estat gof will not work because I am using a subpop. Does anyone know of any commands that would provide me fit statistics in this case?

  • #2

    Welcome to Statalist, Susanna!

    Some suggestions:

    1. Use estat gof. But see this discussion.

    2. If you have continuous or many categorical variables, use linktest.

    3. To compare nested models in general, test that the coefficients of the extra variables are all zero.

    4. Plot weighted ROCs, as shown here.

    5. Estimate predicted probabilities with predict. Use them in diagnostic plots. ((Korn and Graubard (1999, Section 3.6).

    It is possible to estimate quasi-likelihood AIC and BIC statistics (Lumley and Scott, 2015). This can be done with the anova.svyglm method in Tom Lumley's survey package for R. I don't know of a Stata implementation.


    References:

    Korn, Edward Lee, and Barry I Graubard. 1999. Analysis of health surveys. New York: Wiley.

    Lumley, Thomas, and Alastair Scott. 2015. AIC and BIC for modeling with complex survey data. Journal of Survey Statistics and Methodology 3, no. 1: 1-18.
    Last edited by Steve Samuels; 02 Mar 2016, 13:41.
    Steve Samuels
    Statistical Consulting
    [email protected]

    Stata 14.2

    Comment


    • #3
      Hi Susanna,

      Building on Steve's response, if you are OK with using a non-DF/parsimony adjusted fit metric (the AIC and BIC are DF/parsimony adjusted), then you could run logit with the pweight from your svyset and without the svy prefix and use the pseudo-R2 reported. Besides the weights, other aspects of the sampling design (i.e., strata, FPC) affect estimate variability and implied model DF but not the point estimates obtained. Accounting for things other than the weights is important for inference and, as is implied by Lumley and Scott, DF adjustments, but does not apply to simpler, non-adjusted stats such as the McFadden/Likelihood pseudo-R2. The subpop can be incorporated as an if statement here - just don't use the SE's obtained in this analysis - only use the if-d, pweight-d analysis it to obtain the pseudo-R2 value. One last caveat, the dAIC and dBIC are also recommended for model selection - pseudo-R2's are better for model description.

      A (brief) discussion of this issue can be found in the linked Survey Practice article.

      - joe
      Determining subgroup difference importance with complex survey designs: An application of weighted dominance analysis
      Joseph Nicholas Luchman, Ph.D., PStatĀ® (American Statistical Association)
      ----
      Research Fellow
      Fors Marsh

      ----
      Version 18.0 MP

      Comment

      Working...
      X