Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate binary variable with predefined sensitivy/specficity

    Hi,

    I have the following data:

    Dataex:
    Code:
    input float(ev1 ev2 diagnosis ev3)
    1 1 1 .
    0 0 0 0
    1 0 1 .
    0 1 1 1
    0 0 1 .
    1 0 0 0
    ev1 = evaluation1, ev2 = evaluation2, diagnosis = the truth/gold standard (disease yes/no), ev3 = evaluation3

    In cases where ev1 != ev2 & ev3==., I want to create a new variable: ev3_sim with the same performance as ev3 (i.e. the same sensitivity and specificity / the same number of true positives, true negativt, etc.)


    Any ideas how I can do this?

    Thanks,

  • #2
    Well, to answer your question as stated, you could take all of your observations for ev3, then calculate the total proportion of true positives, true negatives, false positives, and false negatives. You can then construct a probability distribution from these proportions and fill in the missing data by drawing randomly from that distribution. It is possible to do this by hand (and I can explain how in detail if you prefer), but you risk introducing bias into your dataset when you impute like this, because the data might not be missing completely at random.

    You might prefer to use a hotdeck imputation instead to generate missing values conditional on the distribution of your variables.

    Code:
    help hotdeck
    Or here is the simpler single imputation only command:

    Code:
    help hotdeckvar
    Here I outline a simple imputation procedure:

    Code:
    clear
    input float(ev1 ev2 diagnosis ev3)
    1 1 1 .
    0 0 0 0
    1 0 1 .
    0 1 1 1
    0 0 1 .
    1 0 0 0
    end
    
    * Impute missing values
    hotdeck ev1 ev2 diagnosis ev3, store
    
    * 
    use imp1, clear

    Comment


    • #3
      Code:
      //  FIRST CALCULATE SENSITIVITY & SPECIFICITY OF EV3
      assert !missing(diagnosis)
      summ ev3 if diagnosis, meanonly
      local sens = r(mean)
      summ ev3 if !diagnosis, meanonly
      local spec = 1 - r(mean)
      
      //  CREATE EV3_SIM
      set seed 1234 // OR WHATEVER RANDOM NUMBER SEED YOU LIKE
      gen ev3_sim = (runiform() < `sens') if diagnosis & missing(ev3) & ev1 != ev2
      replace ev3_sim = (runiform() > `spec') if !diagnosis & missing(ev3) & ev1 != ev2
      Added: Crossed with #2.
      Last edited by Clyde Schechter; 12 Jan 2023, 10:47.

      Comment


      • #4
        #3 is a clever implementation of what I describe in paragraph 1 of post #2 - clever because of the way Clyde takes advantage of the properties of the mean of a dichotomous variable to get a probability. Neat.

        Comment


        • #5
          Thank you very much for your time, Daniel and Clyde.
          It is indeed an elegant way to use and store the sens + spec, Clyde.

          Thanks!

          Comment


          • #6
            A late reply FWIW... Obviously above answers are fine, but I think the simplest and most concise approach here would just be probit + predict. E.g.

            Code:
            probit ev3 ev1 ev2 diagnosis
            predict ev3_hat
            replace ev3 = ev3_hat if ev3 == .
            Apologies for not testing that, but I'm in a rush and it's simple enough to be obvious.

            For more complicated cases, you could use Stata's "mi" command with probit, but that's got some overhead that would be overkill here (at least based on the simple example you provide here)

            Comment

            Working...
            X