Generation of Binary Variables by Probability values

Yunsun Jung

Join Date: Aug 2019
Posts: 6

Generation of Binary Variables by Probability values

05 Oct 2019, 05:48

Dear Community.

I have variables for 'probability to become obese', 'probabilities of osteoarthritis' and 'probability of death' for each individual.
Based on each of these probabilities, I would like to generate variables such as obesity, osteoarthritis and death with binary values (0 or 1).
In other words, I'd like to randomly assign a value of 0 or 1 to the variables(obesity, osteoarthritis, death) depending on each probability.
I'm considering a command like this, "generate byte OA = uniform() <= prOA", but it doesn't seem accurate.
Please give me your correct idea.

Thanks in advance.

Best regards,
Yunsun

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(id sex age agegr sexagegr sm pa) float(probese prOA prDTH)
 1 0 41 4 14 0 0 .0482027 .0045425 .0015648
 2 0 41 4 14 0 0 .0482027 .0045425 .0015648
 3 0 41 4 14 0 0 .0482027 .0045425 .0015648
 4 0 41 4 14 1 1 .0449715 .0045352 .0015648
 5 0 41 4 14 1 0 .9094359 .0069649 .0015648
 6 0 41 4 14 1 0 .9094359 .0069649 .0015648
 7 0 41 4 14 1 1 .9123892 .0069751 .0015648
 8 0 41 4 14 1 0  .043434 .0045318 .0015648
 9 0 41 4 14 1 0  .043434 .0045318 .0015648
10 0 41 4 14 1 1 .0449715 .0045352 .0015648
11 0 41 4 14 0 0 .0482027 .0045425 .0015648
12 0 41 4 14 1 0 .9094359 .0069649 .0015648
13 0 41 4 14 1 0 .9094359 .0069649 .0015648
14 0 41 4 14 1 0  .043434 .0045318 .0015648
15 0 41 4 14 1 0  .043434 .0045318 .0015648
16 0 41 4 14 1 1 .0449715 .0045352 .0015648
17 0 41 4 14 1 1 .9123892 .0069751 .0015648
18 0 41 4 14 1 0  .043434 .0045318 .0015648
19 0 41 4 14 0 0 .0482027 .0045425 .0015648
20 0 41 4 14 1 0  .043434 .0045318 .0015648
21 0 41 4 14 1 0 .9094359 .0069649 .0015648
22 0 41 4 14 1 0 .9094359 .0069649 .0015648
23 0 41 4 14 1 1 .0449715 .0045352 .0015648
24 0 41 4 14 1 1 .0449715 .0045352 .0015648
25 0 41 4 14 1 0 .9094359 .0069649 .0015648
26 0 41 4 14 1 0  .043434 .0045318 .0015648
27 0 41 4 14 1 0  .043434 .0045318 .0015648
28 0 41 4 14 0 0 .9180345 .0069946 .0027712
end

Tags: None

Maarten Buis

Join Date: Mar 2014

Posts: 3449
#2

05 Oct 2019, 11:50

Why do you think your command is not accurate? It should do what you asked.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Yunsun Jung

Join Date: Aug 2019

Posts: 6
#3

06 Oct 2019, 06:20

Thank you for your answer. I'm afraid I don't understand this function correctly.

For example, the average value of 'probability to become obese(probese)' and the 'obese(0 or 1)' is similar.
However, isn't this the overall average of all objects? Each probability is calculated by individual gender, age, and health behaviors.
Although each 'obese' value is given according to 'probese', I am not sure whether the value is randomly distributed to reflect the individual's gender, age, health behaviors, etc.

Please give me your correct idea.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#4

06 Oct 2019, 07:54

the average value of 'probability to become obese(probese)' and the 'obese(0 or 1)' is similar

That is what you should expect.

I am not sure whether the value is randomly distributed to reflect the individual's gender, age, health behaviors, etc.

For each observation, it is randomly distributed to reflect the value of probese in that observation. In turn, probese presumably reflects the probability of obesity given the individual's gender, age, health behaviors, etc. in that observation.

Consider the following example, where I generate probese randomly to get 10000 observations of a probability that varies from observation to observation and whose overall distribution differs by sex.

Code:

. clear . set obs 10000 number of observations (_N) was 0, now 10,000 . set seed 666 . generate byte sex = runiformint(0,1) . generate float probese = rbeta(2+sex,5) . generate byte obese = runiform()<=probese . tabstat probese obese, by(sex) Summary statistics: mean by categories of: sex sex | probese obese ---------+-------------------- 0 | .2874214 .2827655 1 | .3737997 .3706587 ---------+-------------------- Total | .3306969 .3268 ------------------------------
Comment
Yunsun Jung

Join Date: Aug 2019

Posts: 6
#5

07 Oct 2019, 04:41

Your explanation is very clear! Thanks to you, I understand the concept.
Thanks a lot
Comment

Announcement

Generation of Binary Variables by Probability values

Comment

Comment

Comment

Comment