Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generation of Binary Variables by Probability values

    Dear Community.


    I have variables for 'probability to become obese', 'probabilities of osteoarthritis' and 'probability of death' for each individual.
    Based on each of these probabilities, I would like to generate variables such as obesity, osteoarthritis and death with binary values (0 or 1).
    In other words, I'd like to randomly assign a value of 0 or 1 to the variables(obesity, osteoarthritis, death) depending on each probability.
    I'm considering a command like this, "generate byte OA = uniform() <= prOA", but it doesn't seem accurate.
    Please give me your correct idea.

    Thanks in advance.


    Best regards,
    Yunsun

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte(id sex age agegr sexagegr sm pa) float(probese prOA prDTH)
     1 0 41 4 14 0 0 .0482027 .0045425 .0015648
     2 0 41 4 14 0 0 .0482027 .0045425 .0015648
     3 0 41 4 14 0 0 .0482027 .0045425 .0015648
     4 0 41 4 14 1 1 .0449715 .0045352 .0015648
     5 0 41 4 14 1 0 .9094359 .0069649 .0015648
     6 0 41 4 14 1 0 .9094359 .0069649 .0015648
     7 0 41 4 14 1 1 .9123892 .0069751 .0015648
     8 0 41 4 14 1 0  .043434 .0045318 .0015648
     9 0 41 4 14 1 0  .043434 .0045318 .0015648
    10 0 41 4 14 1 1 .0449715 .0045352 .0015648
    11 0 41 4 14 0 0 .0482027 .0045425 .0015648
    12 0 41 4 14 1 0 .9094359 .0069649 .0015648
    13 0 41 4 14 1 0 .9094359 .0069649 .0015648
    14 0 41 4 14 1 0  .043434 .0045318 .0015648
    15 0 41 4 14 1 0  .043434 .0045318 .0015648
    16 0 41 4 14 1 1 .0449715 .0045352 .0015648
    17 0 41 4 14 1 1 .9123892 .0069751 .0015648
    18 0 41 4 14 1 0  .043434 .0045318 .0015648
    19 0 41 4 14 0 0 .0482027 .0045425 .0015648
    20 0 41 4 14 1 0  .043434 .0045318 .0015648
    21 0 41 4 14 1 0 .9094359 .0069649 .0015648
    22 0 41 4 14 1 0 .9094359 .0069649 .0015648
    23 0 41 4 14 1 1 .0449715 .0045352 .0015648
    24 0 41 4 14 1 1 .0449715 .0045352 .0015648
    25 0 41 4 14 1 0 .9094359 .0069649 .0015648
    26 0 41 4 14 1 0  .043434 .0045318 .0015648
    27 0 41 4 14 1 0  .043434 .0045318 .0015648
    28 0 41 4 14 0 0 .9180345 .0069946 .0027712
    end

  • #2
    Why do you think your command is not accurate? It should do what you asked.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Thank you for your answer. I'm afraid I don't understand this function correctly.


      For example, the average value of 'probability to become obese(probese)' and the 'obese(0 or 1)' is similar.
      However, isn't this the overall average of all objects? Each probability is calculated by individual gender, age, and health behaviors.
      Although each 'obese' value is given according to 'probese', I am not sure whether the value is randomly distributed to reflect the individual's gender, age, health behaviors, etc.

      Please give me your correct idea.

      Comment


      • #4
        the average value of 'probability to become obese(probese)' and the 'obese(0 or 1)' is similar
        That is what you should expect.

        I am not sure whether the value is randomly distributed to reflect the individual's gender, age, health behaviors, etc.
        For each observation, it is randomly distributed to reflect the value of probese in that observation. In turn, probese presumably reflects the probability of obesity given the individual's gender, age, health behaviors, etc. in that observation.

        Consider the following example, where I generate probese randomly to get 10000 observations of a probability that varies from observation to observation and whose overall distribution differs by sex.
        Code:
        . clear
        
        . set obs 10000
        number of observations (_N) was 0, now 10,000
        
        . set seed 666
        
        . generate byte sex = runiformint(0,1)
        
        . generate float probese = rbeta(2+sex,5)
        
        . generate byte  obese   = runiform()<=probese
        
        . tabstat probese obese, by(sex)
        
        Summary statistics: mean
          by categories of: sex
        
             sex |   probese     obese
        ---------+--------------------
               0 |  .2874214  .2827655
               1 |  .3737997  .3706587
        ---------+--------------------
           Total |  .3306969     .3268
        ------------------------------

        Comment


        • #5
          Your explanation is very clear! Thanks to you, I understand the concept.
          Thanks a lot

          Comment

          Working...
          X