Generate binary variable with predefined sensitivy/specficity

Sara Hansen

Join Date: Apr 2022

Posts: 30
#1

Generate binary variable with predefined sensitivy/specficity

12 Jan 2023, 06:07

Hi,

I have the following data:

Dataex:

Code:

input float(ev1 ev2 diagnosis ev3) 1 1 1 . 0 0 0 0 1 0 1 . 0 1 1 1 0 0 1 . 1 0 0 0

ev1 = evaluation1, ev2 = evaluation2, diagnosis = the truth/gold standard (disease yes/no), ev3 = evaluation3

In cases where ev1 != ev2 & ev3==., I want to create a new variable: ev3_sim with the same performance as ev3 (i.e. the same sensitivity and specificity / the same number of true positives, true negativt, etc.)

Any ideas how I can do this?

Thanks,
Tags: None
Daniel Schaefer

Join Date: Mar 2020

Posts: 814
#2

12 Jan 2023, 10:44

Well, to answer your question as stated, you could take all of your observations for ev3, then calculate the total proportion of true positives, true negatives, false positives, and false negatives. You can then construct a probability distribution from these proportions and fill in the missing data by drawing randomly from that distribution. It is possible to do this by hand (and I can explain how in detail if you prefer), but you risk introducing bias into your dataset when you impute like this, because the data might not be missing completely at random.

You might prefer to use a hotdeck imputation instead to generate missing values conditional on the distribution of your variables.

Code:

help hotdeck

Or here is the simpler single imputation only command:

Code:

help hotdeckvar

Here I outline a simple imputation procedure:

Code:

clear input float(ev1 ev2 diagnosis ev3) 1 1 1 . 0 0 0 0 1 0 1 . 0 1 1 1 0 0 1 . 1 0 0 0 end * Impute missing values hotdeck ev1 ev2 diagnosis ev3, store * use imp1, clear
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30084

12 Jan 2023, 10:45

Code:

//  FIRST CALCULATE SENSITIVITY & SPECIFICITY OF EV3
assert !missing(diagnosis)
summ ev3 if diagnosis, meanonly
local sens = r(mean)
summ ev3 if !diagnosis, meanonly
local spec = 1 - r(mean)

//  CREATE EV3_SIM
set seed 1234 // OR WHATEVER RANDOM NUMBER SEED YOU LIKE
gen ev3_sim = (runiform() < `sens') if diagnosis & missing(ev3) & ev1 != ev2
replace ev3_sim = (runiform() > `spec') if !diagnosis & missing(ev3) & ev1 != ev2

Added: Crossed with #2.

Last edited by Clyde Schechter; 12 Jan 2023, 10:47.

Comment

Daniel Schaefer

Join Date: Mar 2020

Posts: 814
#4

12 Jan 2023, 11:11

#3 is a clever implementation of what I describe in paragraph 1 of post #2 - clever because of the way Clyde takes advantage of the properties of the mean of a dichotomous variable to get a probability. Neat.
Comment
Sara Hansen

Join Date: Apr 2022

Posts: 30
#5

16 Jan 2023, 07:39

Thank you very much for your time, Daniel and Clyde.
It is indeed an elegant way to use and store the sens + spec, Clyde.

Thanks!
Comment
John Eiler

Join Date: Nov 2019

Posts: 50
#6

24 Jan 2023, 06:31

A late reply FWIW... Obviously above answers are fine, but I think the simplest and most concise approach here would just be probit + predict. E.g.

Code:

probit ev3 ev1 ev2 diagnosis predict ev3_hat replace ev3 = ev3_hat if ev3 == .

Apologies for not testing that, but I'm in a rush and it's simple enough to be obvious.

For more complicated cases, you could use Stata's "mi" command with probit, but that's got some overhead that would be overkill here (at least based on the simple example you provide here)
Comment

Announcement

Generate binary variable with predefined sensitivy/specficity

Comment

Comment

Comment

Comment

Comment