Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to validate logit model?

    Hi,

    any ideas how to validate the logit model in Stata? I have a received a comment that there is no tests or robustness checks to ensure the validity of the model. I did logit regression on a national representative survey (weighted) data and reported output which suggested that overall model is significant.

    EDIT: If the data used is national representative, should the validation still been performed? Another thing: the purpose of the model is not prediction. As a result, should it be validated since I have done econometric analysis for specific country and the model could not be generalized for other countries where different predictors may be used.

    Alternatively, is there a way to dispute the comment without doing the validation?
    Last edited by Ivan Oreskovic; 07 Dec 2018, 09:10.

  • #2
    I think that the comment you received is nonsense. If you have models, you need to validate. The fact of modeling implies a belief in a "true model", which, for finite populations, is considered to hold in a super-population of which yours is one instance.

    You also need to check calibration. Odds ratios do not well-convey levels effects even in an analytic study. For that you need margins, which will assess effects in terms of probabilities, differences in probabiities, and risk ratios Therefore even if prediction is not your goal, you must ensure that predicted probabilities are well calibrated.

    Some things you can do:

    • If you have at least one continuous predictor, run linktest after svy: logit as a general test for specification error.

    * Again if you have continuous predictors, augment the model with non-linear terms and test whether they are significant. Use fp (fractional polynomials) to check the linearity assumptions

    * Check whether interactions are needed.

    * Use calibration belt (SSC)

    * Use margins to compare crude and predicted effects.

    • Run the model without probability weights and see whether conclusions differ. This, I believe, is standard practice in some fields.

    Good luck!.

    Steve Samuels
    Statistical Consulting
    [email protected]

    Stata 14.2

    Comment


    • #3
      Steve Samuels , thank you very much, it's been helpful!

      I just want to check one thing. If I understood correctly, during survey data analysis, logit coefficients are estimated with Maximum Likelihood Estimation (MLE). However, Likelihood Ratio (LR) is not calculated because MLE assumption of independent observations is violated because sample is weighted. Instead, Stata uses F test and reports F statistic which is more appropriate in such case?

      I am asking this because I have received another comment that I should use LR and not F statistic, but as I know, Stata automatically reports F statistic for svy logit model.

      Comment


      • #4
        Logit coefficients are not estimated by MLE which applies only to independent unweighted observations. Rather they are estimated by either maximizing a pseudo-log-likelihood or by finding the zero of a score statistics (both methods iterate). Although a logit model is assumed, variance-covariance matrices are based on the survey design (i.e. on the weights, between-cluster variation, and strata). See this Stata FAQ which explains why likelihood ratio tests (based on standard MLE theory) do not apply. See also chapters 2 and 3 of Chambers and Skinner, 2003.

        References:
        Chambers, R. L., & Skinner, C. J. (Eds.). (2003). Analysis of Survey Data. Hoboken, NJ: John Wiley & Sons Inc.
        Steve Samuels
        Statistical Consulting
        [email protected]

        Stata 14.2

        Comment

        Working...
        X