Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cannot calculate Tjurs Coefficient of Discrimination (D) and AUROC on imputed dataset

    Hello


    I would be very grateful if someone would be able to advise me on this. My dependent variable is 'COST' which is a binary variable. I have various independent variables as listed below. I am using logistic regression to predict COST.


    1) I want to calculate both Tjurs D and the AUROC on my logistic regression models, but cannot do this after I have performed multiple imputation. I can do both of these tasks for the complete case analyses, using the below code:

    Tjur's COD*

    logistic COST honos age gender ib(frequent).marital ib(frequent).ethnicity i.imd5
    predict yhat_scales_SES if e(sample)
    ttest yhat_scales_SES, by (COST)



    *AUROC*

    logistic COST honos age gender ib(frequent).marital ib(frequent).ethnicity i.imd5
    predict AUROC_honos if e(sample)
    roctab COST AUROC_scales_SES, summary



    2) I cannot get this code to work for the imputed datasets despite using mi estimate and cmdok. I have tried various combinations and have gotten different errors:

    *example of commands I've run

    mi estimate: logistic COST honos_imputed age gender ib(frequent).marital ib(frequent).ethnicity i.imd5
    mi estimate, cmdok: predict yhat_scales_SES if e(sample)
    mi estimate: ttest yhat_scales_SES, by (COST)


    *errors received:

    mi estimate: command not supported
    predict is not officially supported by mi estimate; see mi estimation for a list of Stata estimation commands
    that are supported by mi estimate. You can use option cmdok to allow estimation anyway.

    requested action not valid after most recent estimation command
    an error occurred when mi estimate executed predict on m=1


    Would any out there know the STATA commands that I could run to get these two things to work on an imputed dataset? Many thanks in advance if you can help!


    All the best

    Cloudy

  • #2
    re: 1 - I don't know what Tjur's D is; further, I use "lroc" to get AUROC; please look at the following post (#6 in the set): http://www.statalist.org/forums/foru...ng-mi-estimate

    re: 2 - your question is not clear but my guess is that you can use the same logic to obtain predicted values for each imputed data set and then average them

    Comment


    • #3
      Thank you Rich, that is helpful and appreciated!

      Regarding the code you provided, I am unsure where to insert my variables, so I was wondering if you could please specify or mark where my variables would go in? Sorry to bother you with this- I am confused by it unfortunately.


      The logistic regression model I am running uses this 'mi estimate code':

      mi estimate, or: logistic COST honos age gender martial imd5

      I want to estimate, if possible, the AUROC for this imputed model.


      local rhs "armg2 armg3 tbsaburn20 tbsaburn21" noi mi estimate, or saving(miest, replace): logistic hodc `rhs', vce(cluster site) qui mi query local M=r(M) scalar r2=0 scalar cstat=0 qui mi xeq 1/`M': logistic hodc `rhs'; scalar r2=r2+e(r2_p); lroc, nog; scalar cstat=cstat+r(area) scalar r2=r2/`M' scalar cstat=cstat/`M' noi di "Pseudo R=squared over imputed data = " r2 noi di "C statistic over imputed data = " cstat

      Comment


      • #4
        Cloudy Daze: Our daze is clouded too by that code. Please read FAQ Advice as requested:

        http://www.statalist.org/forums/help#realnames Full, real name please.

        http://www.statalist.org/forums/help#stata CODE delimiters to format code readably please.

        Reading the whole document will help you to help us to help you on terms long since established here.

        Comment


        • #5
          first, please read the FAQ so you can post more legibly (and note that real names are requested - the FAQ tells you how to get this changed); please use the CODE delimiters in the future

          second, you can see that the right-hand-side variables (independent variables, covariates, predictors - whatever language you want) are in the first line of my code - just substitute your covariates for mine; then in the two "logistic" statements substitute your outcome variable for mine (mine is "hodc")

          Comment


          • #6
            1. I apologize for not following the guidelines correctly. I will change my name and improve my posting of code.

            2. Thank you very much Rich, that works very well! Just one further query in 3.

            3. Is there a way to get the 95% confidence intervals and standard error for the AUROC?. The final result i get is:

            PHP Code:
            noi di "C statistic over imputed data = " cstat
            C statistic over imputed data 
            .64325504 

            Comment


            • #7
              by looking at "help roctab" you can see that the area and the upper and lower limits of the CI are saved - so you should be able to follow the logic of the code to expand to (1) setting up 3 variables rather than just 1 and (2) saving and averaging 3 variables to get what you want - the code I cited and you are apparently using just needs to be generalized

              Comment


              • #8
                Thank you Rich.

                1. That works for the 3 variables, thanks.

                2. However when you say the below, I am not sure if its directly relevant because I am using the 'lroc' command which does not save / store r(lb) or r (ub) which are the scalars for confidence intervals. These scalars are stored for 'roctab' but not 'lroc'. Lroc just stores r(N) and r(area). Maybe I am wrong though so would you please be able to advise further?

                by looking at "help roctab" you can see that the area and the upper and lower limits of the CI are saved

                Comment


                • #9
                  Hello again,


                  1. Thanks to Rich's help I can work out the AUROC's.

                  2. I am still looking for advice on how to calculate Tjur's D (which is a type of pseudo r-squared for logistic regression models) on an imputed dataset. People may not be familiar with Tjur's D, so here is a good explanation, summarised below:

                  But there’s another R2, recently proposed by Tjur (2009), that I’m inclined to prefer over McFadden’s. It has a lot of intuitive appeal, its upper bound is 1.0, and it’s closely related toR2 definitions for linear models. It’s also easy to calculate. The definition is very simple. For each of the two categories of the dependent variable, calculate the mean of the predicted probabilities of an event. Then, take the difference between those two means. That’s it!
                  Here is the usual code for Tjur's D:

                  Code:
                   use “http://www.uam.es/personal_pdi/econo.../docs/mroz.dta“, clear
                    logistic inlf kidslt6 age educ huswage city exper
                    predict yhat if e(sample)
                    ttest yhat, by(inlf)

                  However this code does not work on an imputed dataset using mi estimate, cmdok command. Would anyone be able to provide me with the code that could work for Tjur's D after mi estimate?


                  Many thanks in advance for any help!



                  All the best
                  Conal

                  Comment


                  • #10
                    re: #8 - sorry for not being clearer - lroc does not have the info needed to get CI's - so you need to switch to something else; since your original posting (#1 above) used roctab, that's what I suggested

                    Comment


                    • #11
                      Ok- thanks Rich!

                      Comment

                      Working...
                      X