Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • logistic risk prediction model

    Hi,

    I have developed a risk prediction model for Outcome (binary) based on 4 binary variables. I developed this on 70% of my dataset and kept 30% as the validation set (included below). AUROC was >0.8 and GOF was non-significant with good sensitivity and specificity.

    Is there a way to calculate the OR of having just any one risk factor (bil/alp/ggt/CBDD) and then 2 risk factors, 3 risk factors, 4 risk factors?

    logistic Outcome bil alp ggt CBDD
    lroc
    lsens
    estat gof, group (10)
    estat class, cutoff(0.2)
    nomolog


    I performed nomolog to get a graphical representation of the model and to be able to manually calculate for any individual but what I would like to say is your OR of Outcome is x if you have any one risk factor, x if you have 2 risk factors etc.

    I would also like to plot predicted vs observed Outcome for my validation cohort using the model above if possible.
    I tried the following for this but the output made sense (presumably because I have binary outcome??):
    logistic Outcome bil alp ggt CBDD
    predict predvar
    predict resid, residuals
    scatter resid predvar

    Many thanks in advance,
    Carla

    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte CBDD float(bil alp alt ggt) byte outcome
    1 1 1 0 0 1
    1 1 1 1 1 1
    0 1 1 1 0 0
    1 1 1 1 1 0
    1 1 1 1 1 1
    1 1 1 1 1 0
    1 1 1 0 0 0
    1 1 1 1 1 1
    1 1 1 1 1 1
    1 1 1 1 1 1
    1 1 1 0 0 1
    1 1 1 1 1 0
    1 1 1 1 1 0
    1 1 1 1 1 0
    0 1 1 1 1 0
    1 1 1 1 1 1
    0 1 1 0 0 0
    0 1 1 1 1 1
    1 1 1 1 1 1
    0 1 1 0 0 0
    0 1 1 1 1 1
    1 1 1 1 1 0
    0 1 1 1 1 0
    0 1 1 1 1 0
    0 1 1 0 0 1
    0 0 1 0 0 0
    1 1 1 1 1 0
    0 1 1 1 1 1
    0 1 1 0 0 0
    0 1 1 1 1 0
    0 1 1 1 1 0
    0 0 1 0 0 0
    0 1 1 1 1 1
    0 1 1 1 1 0
    0 1 1 1 1 0
    0 1 1 1 1 0
    0 1 1 0 1 1
    1 0 1 1 1 1
    0 1 1 1 1 0
    0 1 1 1 1 0
    1 0 1 0 0 0
    0 1 1 1 1 0
    0 1 1 1 1 0
    0 1 1 1 1 0
    1 0 1 0 1 0
    0 1 1 1 1 0
    0 1 1 1 1 0
    1 0 1 1 1 0
    0 1 1 1 1 0
    0 1 1 1 1 0
    0 1 1 1 1 0
    1 0 1 0 1 0
    1 1 0 0 0 0
    0 1 1 1 1 1
    0 1 1 1 1 0
    0 0 1 0 0 0
    0 0 1 0 1 0
    0 0 1 0 0 0
    0 1 1 1 1 0
    0 1 1 1 1 0
    1 0 1 0 1 0
    0 1 1 1 1 0
    0 0 1 0 1 0
    0 1 1 1 1 0
    1 0 1 1 1 0
    0 0 1 0 0 1
    0 0 1 1 1 0
    0 0 1 1 1 1
    0 0 1 0 0 0
    1 1 0 1 1 0
    0 1 0 0 0 0
    0 1 0 0 0 0
    0 1 0 0 0 0
    1 1 0 1 1 0
    0 0 1 1 1 0
    1 1 0 0 1 0
    0 0 1 0 1 0
    0 0 1 1 1 0
    0 0 1 1 1 0
    1 0 0 0 0 0
    0 0 1 1 1 0
    0 1 0 0 0 0
    1 0 0 0 0 0
    1 0 0 0 0 0
    1 0 0 0 0 0
    1 1 0 1 1 0
    0 1 0 0 0 0
    0 0 1 1 1 0
    1 0 0 0 0 0
    0 1 0 1 1 0
    1 0 0 0 0 0
    0 1 0 1 1 0
    0 0 1 1 1 0
    0 0 0 0 0 0
    0 1 0 0 0 0
    0 0 0 0 0 0
    0 0 0 0 0 0
    1 0 0 0 0 0
    0 0 0 0 0 0
    0 0 0 0 0 0
    end
    ------------------ copy up to and including the previous line ------------------

    Listed 100 out of 226 observations
    Use the count() option to list more

  • #2
    The model you fit is:

    Code:
    logistic Outcome bil alp ggt CBDD
    To this question:
    Is there a way to calculate the OR of having just any one risk factor (bil/alp/ggt/CBDD)...
    There isn't a way to do this with the model you fit, because you are treating each of those four risk factors as distinct and not identical. They might have effects that differ considerably. The issue applies to any combination of two or three of the risk factors. You could create a count of the 4 risk factors, and you would be getting something like the average risk of each risk factor. I think a lot of statisticians and clinicians would probably prefer each one modeled with a dummy.

    Now, you might want to familiarize yourself with the margins post-estimation command. You can easily display the risk difference with respect to the base category. It's possible to get margins to display odds ratios, but it requires some work. You can then use marginsplot to plot. I think Richard Williams' primer on margins may be a bit easier to read than the manual examples, but those are also good.
    Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

    When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

    Comment

    Working...
    X