offset after logit predicting probabilities

ashar ata

Join Date: Nov 2014

Posts: 29
#1

offset after logit predicting probabilities

06 Nov 2014, 15:33

Hello users-
Can someone please verify if I wanted to test the fit of the model using ROCTAB after running logistic model with time at risk as on offset this is how I would do it.
y is binomial (0,1)

xi: logit y x1 x2, or offset(ln-timeatrisk)
predict p1, p
gen p1inv=1-p1
roctab y p1inv, graph

Thanks
Ashar
Tags: None

1 like
Clyde Schechter

Join Date: Apr 2014

Posts: 30116
#2

06 Nov 2014, 18:29

Several things, mostly in the way of details.

First, roctab does not test model fit, it tests model discrimination.

The xi: prefix in your logit command does nothing because there are no corresponding i. variables in the variable list. If your intent is that x1 or x2 should be treated as nominal variables in the model, you need to prefix them with i.; and even then you still don't need the xi: prefix if you are running a modern version of Stata. See -help fvvarlist- to see how factor variables work.

Your offset(ln-timeatrisk) option is invalid because ln-timeatrisk is not a valid variable name: embedded hyphens are not permitted. (Embedded underscore characters are permitted--perhaps that it what you intended.)

I don't understand why you want to run -roctab- on p1inv instead of p1 itself. p1, being the predicted probability of y = 1 conditional on x1 x2 and the offset variable, will bear a monotone increasing relationship to the expected value of y conditional on those same things. If your concern is that you expect the relationship between y and x1, x2 to be inverse, that doesn't matter. That will show up in the coefficients of x1 or x2 being negative: p1 will still be positively associated with y. So you should just do -roctab y p1, graph-
1 like
Comment
ashar ata

Join Date: Nov 2014

Posts: 29
#3

26 May 2016, 10:19

thanks Clyde. Revisiting the same problem after 2 years. The earlier post was written in a haste. Here is the real issue.

The model is
logit readmission var1 var2 ..., or offset(ln_timeatrisk)
lroc

When I run this model without the offset term and try to get the discrimination based on lroc , i get 0.64. When I run this with the offset term I get 0.21. What does this tell me about the model discrimination?
Thanks
Ashar
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30116
#4

26 May 2016, 10:55

Well, something very bizarre is happening, because -lroc- after -logit- should never produce a result < 0.5. (You can have an area under an ROC curve that is less than 0.5 if the predictor is reversed in direction, but the -xb- from -logit- is never reversed in direction!)

Please show the exact code and the exact Stata output, in code blocks.
Comment
ashar ata

Join Date: Nov 2014

Posts: 29
#5

26 May 2016, 11:09

logit relupreadm i.electsurg asa copd htn dial strd bleed_dis i.diab i.morbinhosp disca optym_cat , or

Iteration 0: log likelihood = -59797.409
Iteration 1: log likelihood = -58512.429
Iteration 2: log likelihood = -57426.124
Iteration 3: log likelihood = -57420.588
Iteration 4: log likelihood = -57420.585

Logistic regression Number of obs = 330,848
LR chi2(12) = 4753.65
Prob > chi2 = 0.0000
Log likelihood = -57420.585 Pseudo R2 = 0.0397

------------------------------------------------------------------------------
relupreadm | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.electsurg | .8076894 .0155534 -11.09 0.000 .7777734 .8387561
asa | 1.759497 .0348887 28.50 0.000 1.692428 1.829224
copd | 1.298561 .0477513 7.10 0.000 1.208263 1.395607
htn | 1.071004 .0204128 3.60 0.000 1.031733 1.111769
dial | 1.221159 .0734857 3.32 0.001 1.085299 1.374026
strd | 1.71982 .057456 16.23 0.000 1.610816 1.8362
bleed_dis | 1.316503 .0511455 7.08 0.000 1.219981 1.420662
|
diab |
1 | .984154 .0287735 -0.55 0.585 .9293444 1.042196
2 | 1.16248 .0382898 4.57 0.000 1.089805 1.240002
|
1.morbinhosp | 2.329961 .0584553 33.71 0.000 2.218162 2.447395
disca | 1.733789 .0657115 14.52 0.000 1.609664 1.867486
optym_cat | 1.214024 .0125557 18.75 0.000 1.189663 1.238884
_cons | .0256997 .0006126 -153.61 0.000 .0245267 .0269287
------------------------------------------------------------------------------

. lroc

Logistic model for relupreadm

number of observations = 330848
area under ROC curve = 0.6642

************************************************** ************************************************** ************************************************** ***********
gen lnreadmrisktime1=ln(readmrisktime1)

logit relupreadm i.electsurg asa copd htn dial strd bleed_dis i.diab i.morbinhosp disca optym_cat , or offset(lnreadmrisktime1)

Iteration 0: log likelihood = -78745.244
Iteration 1: log likelihood = -78114.58
Iteration 2: log likelihood = -75647.132
Iteration 3: log likelihood = -75622.604
Iteration 4: log likelihood = -75622.546
Iteration 5: log likelihood = -75622.546

Logistic regression Number of obs = 330,730
Wald chi2(12) = 7183.29
Log likelihood = -75622.546 Prob > chi2 = 0.0000

------------------------------------------------------------------------------
relupreadm | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.electsurg | .7599498 .0147349 -14.16 0.000 .7316118 .7893855
asa | 1.87472 .0373773 31.52 0.000 1.802875 1.949428
copd | 1.34596 .0498247 8.03 0.000 1.251764 1.447244
htn | 1.078961 .0206875 3.96 0.000 1.039167 1.12028
dial | 1.241402 .075355 3.56 0.000 1.102156 1.398239
strd | 1.811478 .0610081 17.64 0.000 1.695765 1.935086
bleed_dis | 1.360732 .053412 7.85 0.000 1.259972 1.46955
|
diab |
1 | .9857896 .028922 -0.49 0.626 .9307025 1.044137
2 | 1.176698 .0390194 4.91 0.000 1.102654 1.255714
|
1.morbinhosp | 2.875863 .073014 41.61 0.000 2.736261 3.022589
disca | 1.920329 .0735182 17.04 0.000 1.78151 2.069966
optym_cat | 1.237161 .0128727 20.45 0.000 1.212186 1.26265
_cons | .0008998 .0000215 -293.48 0.000 .0008587 .000943
lnreadmrisktime1| 1 (offset)
------------------------------------------------------------------------------

. lroc

Logistic model for relupreadm

number of observations = 330730
area under ROC curve = 0.2110

************************************************** ************************************************** ********
Comment
ashar ata

Join Date: Nov 2014

Posts: 29
#6

26 May 2016, 12:14

I think I might have figured out what was happening there.

This was a model to predict readmission and for those who had stayed in the hospital upto 14 days after surgery. Everyone was followed for up to 30 days after surgery.

I had set the time-at-risk=time-to-event for those who had the outcome of interest.

Therefore the time at risk for readmission for those who were readmitted (outcome=1) after discharge ranged from 0 (stayed for any number of days in hospital but readmitted the same day) to 30(stayed in hospital 1 days and readmitted after 30 days)

And time at risk for readmission for those who were not readmitted (outcome=0) after discharge ranged from 16 (stayed for 14 days in hospital) to 30(stayed only 1 days in hospital).

This created a very unbalanced time-at-risk for the readmitted and not-readmitted patients.
So, I think that explains the reversal of ROC based on offset(ln_timeatrisk)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30116
#7

26 May 2016, 12:16

Well, I am flummoxed. The output you are getting from -lroc- after the second model should be impossible. I really don't know what to make of it. And I'm unable to get similar results using any data sets I have available. No matter how I torture the variables, I always get an output from -lroc- that exceeds 0.5, because if the regressors are anti-sense to the outcome, their coefficients come out negative, and so -xb- is always in the same sense as the outcome.

What I have been able to do is get some very bizarre -logit- outputs using the -offset()- option when the variable chosen as offset is anti-sense to the outcome. In effect, constraining the coefficient of a variable to be 1 when it is inversely related to the outcome probability can severely distort the coefficients, and, if pushed hard enough, can cause the logistic regression to fail to converge. But that isn't what's going on here. In fact, it is striking how close the coefficients in the two models are to each other.

Here's what I would do:

1. Make sure that your Stata installation is completely up-to-date.
2. If the same results persist, I would contact technical support about this.
3. In the meantime, I don't trust the results from -lroc-. They are clearly wrong for the second model, so I lose confidence in those for the first as well.
4. So, I would use a different program to compute the area under the ROC curve. Re-run each model, and follow it with

Code:

predict p, pr roctab relupreadm p

That way your ROC areas are coming from the code for -predict- and the code for -roctab-, and not relying on the apparently questionable -lroc- code.
1 like
Comment

Announcement

offset after logit predicting probabilities

Comment

Comment

Comment

Comment

Comment

Comment