Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mi beta coding for R-squared for Logit models...help please?

    After months of trying to understand the mi estimate syntax for odds ratios using mvn imputation...I found it thanks to this community!

    Before I get out the champagne, I need to know how to find the Pseudo R-square. Here is my current syntax:

    mi estimate, or : logit HEALTH_LOT AGE AGE2 woman2 nonhispblack other hisp SEX2 college_2 continc OWNRENT2 RURAL2 partnered SELFRATED PHYS_ABLE LONGLIVE AGEID, or

    mi estimate, or : logit MONEY_LOT AGE AGE2 woman2 nonhispblack other hisp SEX2 college_2 continc OWNRENT2 RURAL2 partnered SELFRATED PHYS_ABLE LONGLIVE AGEID, or

    mi estimate, or : logit LIVING_LOT AGE AGE2 woman2 nonhispblack other hisp SEX2 college_2 continc OWNRENT2 RURAL2 partnered SELFRATED PHYS_ABLE LONGLIVE AGEID, or

    mi estimate, or : logit ENDPLN_LOT AGE AGE2 woman2 nonhispblack other hisp SEX2 college_2 continc OWNRENT2 RURAL2 partnered SELFRATED PHYS_ABLE LONGLIVE AGEID, or

    mi estimate, or : logit DRIVPLAN_LOT AGE AGE2 woman2 nonhispblack other hisp SEX2 college_2 continc OWNRENT2 RURAL2 partnered SELFRATED PHYS_ABLE LONGLIVE AGEID, or
    mibeta DRIVPLAN_LOT AGE AGE2 woman2 nonhispblack other hisp SEX2 college_2 continc OWNRENT2 RURAL2 partnered SELFRATED PHYS_ABLE LONGLIVE AGEID

    NOTE: the mibeta under the last line of coding does provide an R-square....but it's treating it like a regression rather than logit...bringing me back to square one.

    Grateful to this community for teaching me to put the or at the beginning rather than the end. If I could learn how to get pseudo R-squared, that would be awesome.

  • #2
    I think you will have to use -mi xeq- as I don't think you can obtain it directly; an example can be found in #6 of https://www.statalist.org/forums/for...ng-mi-estimate

    Comment


    • #3
      thank you for the link - I'm still confused - anyway to ELI5? I know the code provided should not be "as is" but I'm not sure how to translate what I need into it. Was there an official example that explained which parts should be filled in? I see that you mention learning this from stata tech support.

      Comment


      • #4
        Okay, here is what I put - I received a statistic...but I'm not sure if I did it right - I presume the "rhs" is the predictors and the Logistic is the outcome? What is supposed to go in the cluster?

        local rhs "AGE AGE2 woman2 nonhispblack other hisp SEX2 college_2 continc OWNRENT2 RURAL2 partnered SELFRATED PHYS_ABLE LONGLIVE AGEID"
        noi mi estimate, or saving(miest, replace): logistic DRIVPLAN_LOT `rhs', vce(cluster DRIVPLAN_LOT)
        qui mi query
        local M=r(M)
        scalar r2=0
        scalar cstat=0
        qui mi xeq 1/`M': logistic DRIVPLAN_LOT `rhs'; scalar r2=r2+e(r2_p); lroc, nog; scalar cstat=cstat+r(area)
        scalar r2=r2/`M'
        scalar cstat=cstat/`M'
        noi di "Pseudo R=squared over imputed data = " r2
        noi di "C statistic over imputed data = " cstat

        Comment


        • #5
          the "cluster()" is used for cluster-adjusted robust standard errors if and only if you have clustered data - do you?

          "Logistic" is the command, not the outcome - maybe I'm missing something in your question since it makes no sense to me

          the rest looks fine to me but you should have received two numbers - did you and do they seem credible/sensible (e.g., is the r2 in the right range? is the c statistic in the right range?)

          I'm guessing here that AGE2 means age-squared; you might want to learn about factor variable notation as it is, at least, convenient and, if you ever want to use the =margins- command, it is necessary; see
          Code:
          help fvvarlist

          Comment


          • #6
            The Pseudo-R squared seems credible - I don't know exactly what cluster-adjusted robust standard errors means. In grad school I passed multivariate by the skin of my teeth. Did well in advanced stats, however.

            mvn imputation is all self-taught from Acock's a gentle intro to stata. Learned the regression version of mvn from the book. learned the logistic code from this great community.

            My question is, what goes in the " " next to "local rhs" - I presumed predictors. What goes next to logistic? I presumed the outcome variable which is the DRIVPLAN_LOT - it is an outcome of how many had a plan for driving retirement with "a lot" as the answer. Next to "cluster" I repeated the outcome variable because I did not know what to put. Is it possible to skip the cluster part?

            Yes, the AGE2 is age-squared. Does this cause issue with the coding?

            I'm simply trying to receive a Pseudo- R-square - the one I received seemed credible at approximately 3% - similar to what I would get if I did this without mvn imputation. I did receive a c-stat - I am lost on what to do with it or if is necessary for my project. I thank you for walking me through this and appreciate any clarification.

            Comment


            • #7
              Tried the code with the second line cut off at "DRIVPLAN_LOT" - that is not using anything else in that line (no 'rhs'.....) - I still received stats! I think the Pseudo-R squared was the same, if not similar to what I received previously. I still welcome your feedback

              Comment


              • #8
                Also, is it normal for the stats provided through this code to differ from those in the original code? That is I get much more significance for almost all of the predictors in this one than I would in:

                mi estimate, or : logit DRIVPLAN_LOT AGE AGE2 woman2 nonhispblack other hisp SEX2 college_2 continc OWNRENT2 RURAL2 partnered SELFRATED PHYS_ABLE LONGLIVE AGEID, or

                Comment


                • #9
                  re: #6, yes, the predictors go inside the quote marks following "local rhs" (which I think is what you are asking);

                  and, yes, the variable name following -logistic- is the outcome variable

                  not using factor variable notation will not cause problems with your estimation - it is just more convenient for certain post hoc commands such as -margins- (which you, so far at least, do not appear to be using);

                  ignore the c statistic if you don't know what it is

                  leave out the "vce()" option if you don't understand it (and if your data are not nested or clustered)

                  re: #7: there is no problem estimating a logistic regression with no predictors (but I doubt it is what you want)

                  re: #8 - your question is not clear to me but I see no reason why the overall result (i.e., not using "mi xeq") should be the same as the results of the estimation on the single imputations what your get from "mi xeq" (not sure that's what you're asking however)

                  Comment


                  • #10
                    Thanks for the clarification - Re: #7, I meant on the second line rhs - I definitely put the predictors in.

                    I'll try to clarify #8 The code I put up there for mvn imputation (which I learned from this community!) provides different stats than if I used the coding you have been trouble shooting with me. Is it advisable to just use the stats from my coding in #8 and report the pseudo R-square from the coding in #4? [that is less `rhs', vce(cluster DRIVPLAN_LOT)]

                    Or, should I alternatively use the coding I just mentioned for the outputs and the R-square? That would be the #4. I am making a table of analyses for a paper. I want to make sure I am doing everything accurately.

                    You will have to excuse me, I am used to using one code for the imputed outputs/stats and another code for the R-square because that is how Acock's gentle guide to stata demonstrated for regression analyses.
                    Last edited by Cherish Michael; 24 Mar 2022, 07:57.

                    Comment


                    • #11
                      sorry but I'm still confused - the code in #4 is ONLY for pseudo-r-squared and for c stat - does that help?

                      Comment


                      • #12
                        Yup! Thank you for all of this trouble-shooting. I appreciate your ability to break it down for me.

                        Comment

                        Working...
                        X