Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • LR chi2 and Pseudo-R^2 - Enough to assess model fit?

    Dear all,

    Is it sufficient to conclude a logit model's fit based on the LR chi2, prob > chi2, and pseudo-R^2 (/McFadden's R^2)? Or must I run other tests?

    Data used: Labor Force Survey

    Code:
    logit Y i.sex i.education i.sec3 i.urbrur i.marital i.age_grp
    Click image for larger version

Name:	Capture5.PNG
Views:	1
Size:	13.2 KB
ID:	1450125




    Edit: I'm still not sure whether or not weights should be included in the logit regression, so I have posted the weighted version on Stata below as well.


    Code:
    logit Y i.sex i.education i.sec3 i.urbrur i.marital i.age_grp[pw=round(weight)]
    Click image for larger version

Name:	Capture4.PNG
Views:	1
Size:	16.4 KB
ID:	1450126





    Any pointers are highly appreciated!

    Thank you!


    Last edited by Kim Veloso; 21 Jun 2018, 20:17.

  • #2
    Kim:
    see also -help estat gof-.
    Kind regards,
    Carlo
    (Stata 18.0 SE)

    Comment


    • #3
      Originally posted by Carlo Lazzaro View Post
      Kim:
      see also -help estat gof-.
      Thank you so much, Sir Carlo! I appreciate your help!

      I have two follow up questions, if I may.

      First, after running my unweighted logit regression,
      Code:
      . logit new_occgrp i.sex i.education i.sec3 i.urbrur i.marital i.age_grp, robust 
      I ran your suggested command and got the following output from Stata:
      Code:
      . estat gof, group(10)
      
      number    of observations =    35600
      number of groups =    10
      Hosmer    Lemeshow chi2(8) =   124.03
      Prob > chi2 =   0.0000
      Does a significant Hosmer Lemeshow suggest a "bad" fit?

      Secondly, I also tried to run
      Code:
      linktest
      If my unweighted logit regression, fails the linktest (i.e._hatsq is significant) and/or fails the McFadden R^2 (i.e. below 0.2), but the weighted logit regression passes both (_hatsq not significant and McFadden R^2 is above 0.2), does it mean that I should be using the weighted logit regression instead?

      Thank you very much once again!

      Comment


      • #4
        re: Hosmer-Lemeshow - yes, a statistically significant result shows a problem; note, however, that this test can be "too powerful"; see #13 in https://www.statalist.org/forums/for...on-survey-data

        Comment


        • #5
          the OP sent me a private message with a follow-up question:
          Dear Mr. Goldstein,

          Thank you very much for answering my question. I have also been reading about Hosmer-Lemeshow and realize that I need to assess it with caution..

          A follow up question, if I may:

          Must I use svyset if I want to run a logit regression using labor force survey data?

          Thank you very much once again.

          Best,
          Kim
          first, please don't do that

          second, I am not an expert on the data you are talking about - in general, the answer is "it depends" - in at least some situations, you can use "pweights" with logistic regression and they do not require the use of -svyset-

          Comment


          • #6
            As luck would have it, this old thread on pweights vs svy came back to life the other day:

            https://www.statalist.org/forums/for...-are-available

            Personally, one way or another, I usually use the weights. Having said that, there are arguments for NOT weighting. But I think you have to understand things really well to not weight. See the Appx of

            https://www3.nd.edu/~rwilliam/stats3/SvyCautionsX.pdf
            -------------------------------------------
            Richard Williams, Notre Dame Dept of Sociology
            Stata Version: 17.0 MP (2 processor)

            EMAIL: [email protected]
            WWW: https://www3.nd.edu/~rwilliam

            Comment


            • #7
              Thank you very much, Mr. Goldstein, and Mr. Williams! I highly appreciate your input.

              After some more reading and consultations, I have decided to run the following regression using the Labor Force Survey data.
              It follows the suggested approach in the UNC Carolina Population Center: http://www.cpc.unc.edu/research/tool...rveys/logistic

              Code:
              logit Y i.sex i.education i.sector i.urban i.marital i.age_grp if working==1 [pw=weight], cluster(psu)
              McFadden's pseudo-R^2 is above 0.2 and _hatsq is not significant suggesting good model fit and specification.

              Thank you all very much once again!
              ​​​​

              Comment

              Working...
              X