Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • offset after logit predicting probabilities

    Hello users-
    Can someone please verify if I wanted to test the fit of the model using ROCTAB after running logistic model with time at risk as on offset this is how I would do it.
    y is binomial (0,1)

    xi: logit y x1 x2, or offset(ln-timeatrisk)
    predict p1, p
    gen p1inv=1-p1
    roctab y p1inv, graph

    Thanks
    Ashar

  • #2
    Several things, mostly in the way of details.

    First, roctab does not test model fit, it tests model discrimination.

    The xi: prefix in your logit command does nothing because there are no corresponding i. variables in the variable list. If your intent is that x1 or x2 should be treated as nominal variables in the model, you need to prefix them with i.; and even then you still don't need the xi: prefix if you are running a modern version of Stata. See -help fvvarlist- to see how factor variables work.

    Your offset(ln-timeatrisk) option is invalid because ln-timeatrisk is not a valid variable name: embedded hyphens are not permitted. (Embedded underscore characters are permitted--perhaps that it what you intended.)

    I don't understand why you want to run -roctab- on p1inv instead of p1 itself. p1, being the predicted probability of y = 1 conditional on x1 x2 and the offset variable, will bear a monotone increasing relationship to the expected value of y conditional on those same things. If your concern is that you expect the relationship between y and x1, x2 to be inverse, that doesn't matter. That will show up in the coefficients of x1 or x2 being negative: p1 will still be positively associated with y. So you should just do -roctab y p1, graph-

    Comment


    • #3
      thanks Clyde. Revisiting the same problem after 2 years. The earlier post was written in a haste. Here is the real issue.

      The model is
      logit readmission var1 var2 ..., or offset(ln_timeatrisk)
      lroc

      When I run this model without the offset term and try to get the discrimination based on lroc , i get 0.64. When I run this with the offset term I get 0.21. What does this tell me about the model discrimination?
      Thanks
      Ashar

      Comment


      • #4
        Well, something very bizarre is happening, because -lroc- after -logit- should never produce a result < 0.5. (You can have an area under an ROC curve that is less than 0.5 if the predictor is reversed in direction, but the -xb- from -logit- is never reversed in direction!)

        Please show the exact code and the exact Stata output, in code blocks.

        Comment


        • #5
          logit relupreadm i.electsurg asa copd htn dial strd bleed_dis i.diab i.morbinhosp disca optym_cat , or

          Iteration 0: log likelihood = -59797.409
          Iteration 1: log likelihood = -58512.429
          Iteration 2: log likelihood = -57426.124
          Iteration 3: log likelihood = -57420.588
          Iteration 4: log likelihood = -57420.585

          Logistic regression Number of obs = 330,848
          LR chi2(12) = 4753.65
          Prob > chi2 = 0.0000
          Log likelihood = -57420.585 Pseudo R2 = 0.0397

          ------------------------------------------------------------------------------
          relupreadm | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
          -------------+----------------------------------------------------------------
          1.electsurg | .8076894 .0155534 -11.09 0.000 .7777734 .8387561
          asa | 1.759497 .0348887 28.50 0.000 1.692428 1.829224
          copd | 1.298561 .0477513 7.10 0.000 1.208263 1.395607
          htn | 1.071004 .0204128 3.60 0.000 1.031733 1.111769
          dial | 1.221159 .0734857 3.32 0.001 1.085299 1.374026
          strd | 1.71982 .057456 16.23 0.000 1.610816 1.8362
          bleed_dis | 1.316503 .0511455 7.08 0.000 1.219981 1.420662
          |
          diab |
          1 | .984154 .0287735 -0.55 0.585 .9293444 1.042196
          2 | 1.16248 .0382898 4.57 0.000 1.089805 1.240002
          |
          1.morbinhosp | 2.329961 .0584553 33.71 0.000 2.218162 2.447395
          disca | 1.733789 .0657115 14.52 0.000 1.609664 1.867486
          optym_cat | 1.214024 .0125557 18.75 0.000 1.189663 1.238884
          _cons | .0256997 .0006126 -153.61 0.000 .0245267 .0269287
          ------------------------------------------------------------------------------

          . lroc

          Logistic model for relupreadm

          number of observations = 330848
          area under ROC curve = 0.6642


          ************************************************** ************************************************** ************************************************** ***********
          gen lnreadmrisktime1=ln(readmrisktime1)


          logit relupreadm i.electsurg asa copd htn dial strd bleed_dis i.diab i.morbinhosp disca optym_cat , or offset(lnreadmrisktime1)

          Iteration 0: log likelihood = -78745.244
          Iteration 1: log likelihood = -78114.58
          Iteration 2: log likelihood = -75647.132
          Iteration 3: log likelihood = -75622.604
          Iteration 4: log likelihood = -75622.546
          Iteration 5: log likelihood = -75622.546

          Logistic regression Number of obs = 330,730
          Wald chi2(12) = 7183.29
          Log likelihood = -75622.546 Prob > chi2 = 0.0000

          ------------------------------------------------------------------------------
          relupreadm | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
          -------------+----------------------------------------------------------------
          1.electsurg | .7599498 .0147349 -14.16 0.000 .7316118 .7893855
          asa | 1.87472 .0373773 31.52 0.000 1.802875 1.949428
          copd | 1.34596 .0498247 8.03 0.000 1.251764 1.447244
          htn | 1.078961 .0206875 3.96 0.000 1.039167 1.12028
          dial | 1.241402 .075355 3.56 0.000 1.102156 1.398239
          strd | 1.811478 .0610081 17.64 0.000 1.695765 1.935086
          bleed_dis | 1.360732 .053412 7.85 0.000 1.259972 1.46955
          |
          diab |
          1 | .9857896 .028922 -0.49 0.626 .9307025 1.044137
          2 | 1.176698 .0390194 4.91 0.000 1.102654 1.255714
          |
          1.morbinhosp | 2.875863 .073014 41.61 0.000 2.736261 3.022589
          disca | 1.920329 .0735182 17.04 0.000 1.78151 2.069966
          optym_cat | 1.237161 .0128727 20.45 0.000 1.212186 1.26265
          _cons | .0008998 .0000215 -293.48 0.000 .0008587 .000943
          lnreadmrisktime1| 1 (offset)
          ------------------------------------------------------------------------------

          . lroc

          Logistic model for relupreadm

          number of observations = 330730
          area under ROC curve = 0.2110


          ************************************************** ************************************************** ********

          Comment


          • #6


            I think I might have figured out what was happening there.

            This was a model to predict readmission and for those who had stayed in the hospital upto 14 days after surgery. Everyone was followed for up to 30 days after surgery.


            I had set the time-at-risk=time-to-event for those who had the outcome of interest.

            Therefore the time at risk for readmission for those who were readmitted (outcome=1) after discharge ranged from 0 (stayed for any number of days in hospital but readmitted the same day) to 30(stayed in hospital 1 days and readmitted after 30 days)

            And time at risk for readmission for those who were not readmitted (outcome=0) after discharge ranged from 16 (stayed for 14 days in hospital) to 30(stayed only 1 days in hospital).

            This created a very unbalanced time-at-risk for the readmitted and not-readmitted patients.
            So, I think that explains the reversal of ROC based on offset(ln_timeatrisk)

            Comment


            • #7
              Well, I am flummoxed. The output you are getting from -lroc- after the second model should be impossible. I really don't know what to make of it. And I'm unable to get similar results using any data sets I have available. No matter how I torture the variables, I always get an output from -lroc- that exceeds 0.5, because if the regressors are anti-sense to the outcome, their coefficients come out negative, and so -xb- is always in the same sense as the outcome.

              What I have been able to do is get some very bizarre -logit- outputs using the -offset()- option when the variable chosen as offset is anti-sense to the outcome. In effect, constraining the coefficient of a variable to be 1 when it is inversely related to the outcome probability can severely distort the coefficients, and, if pushed hard enough, can cause the logistic regression to fail to converge. But that isn't what's going on here. In fact, it is striking how close the coefficients in the two models are to each other.

              Here's what I would do:

              1. Make sure that your Stata installation is completely up-to-date.
              2. If the same results persist, I would contact technical support about this.
              3. In the meantime, I don't trust the results from -lroc-. They are clearly wrong for the second model, so I lose confidence in those for the first as well.
              4. So, I would use a different program to compute the area under the ROC curve. Re-run each model, and follow it with
              Code:
              predict p, pr
              roctab relupreadm p
              


              That way your ROC areas are coming from the code for -predict- and the code for -roctab-, and not relying on the apparently questionable -lroc- code.

              Comment

              Working...
              X