How draw a number from a given vector of distribution?

Demetrio Guzzardi

Join Date: Nov 2017

Posts: 42
#1

How draw a number from a given vector of distribution?

01 Jan 2021, 09:54

Hello everyone, and happy new year!

I searched everywhere for an answer but could not find any, so I hope someone here could help me out.

I have a vector where each entry is a given probability from 0 to 1.

What I need is a function/way to create a new dummy vector with values 0 or 1 where the probability of having a 1 is given by the probability of the line of the other vector.
That is: if in probability-vector I have an entry with 0.5 I need to have a corresponding cell equal to 1 (almost) half of the time over 100 repetitions of this function.
Something like the table below

Probability iter1 iter2 iter3 iter4 iter5 iter6 iter7 iter8 iter9 iter10

0.5 1 1 0 1 0 0 1 1 0 1

1 1 1 1 1 1 1 1 1 1 1

Do you know how I could achieve something like this?

please let me know if the question is not clear enough
Thank you all,
D.
Tags: None

William Lisowski

Join Date: Dec 2014
Posts: 10150

01 Jan 2021, 12:22

Perhaps this example using Mata vectors will start you in a useful direction.

Code:

set seed 42
mata:
p1 = .5
p2 = 1
x = runiform(1,10)
y = x:<=p1
z = x:<=p2
x \ y \ z
end

Code:

: x \ y \ z
                 1             2             3             4             5             6
    +-------------------------------------------------------------------------------------
  1 |   .755155533   .6390313939   .7521452007   .1362726836   .9032689664   .0940683118
  2 |            0             0             0             1             0             1
  3 |            1             1             1             1             1             1
    +-------------------------------------------------------------------------------------
                 7             8             9            10
     ---------------------------------------------------------+
  1    .5745703041   .3728876995   .2738741017   .3902708814  |
  2              0             1             1             1  |
  3              1             1             1             1  |
     ---------------------------------------------------------+

If not, your question really isn't clear without more detail, or at a minimum it is too difficult to guess at a good answer from what you have shared. Please help us help you. Show example data - your "probability vector". Is it a Stata dataset variable? Is it a Stata matrix? Is it a Mata matrix? In my example, I have just used two Mata scalars for convenience to demonstrate a technique you should be able to adapt.

The Statalist FAQ provides advice on effectively posing your questions, posting data, and sharing Stata output.

Comment

Demetrio Guzzardi

Join Date: Nov 2017

Posts: 42
#3

01 Jan 2021, 13:44

Thanks a lot William and sorry for not being very clear.

So what I actually did is estimating a probit model from which I estimated the following probabilities in a standard Stata dataset variable (not mata):

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input double phat_af .94836003308945 .8769213812682086 .9215910212697133 .9160083252126953 .8136632295801208 .6240468606644965 .5985503567780064 .49904546967474017 .9675509892934736 .9489425567744427 .9239985676019286 .9951993575259833 .5700241203161944 .8838109343484395 .7709431776265225 .8589910480484257 .6265504674642524 .5553053201189672 .032334087049388546 .974324634276116 end

Now I would like to create a new variable called dummy made of 0 and 1. And the 0 and 1 have to be given by the probability of the variable phat_af. So If I create the 10 dummy, I expect to have Zeros 5 times when the phat_af is equal to 0.5.

Your suggetion is interesting though. Let me see if I got it correctly, you use the uniform distribution to create a new variable that we can call r_unif and compare the phat_af with r_unif. If r_unif<=phat_af then the dummy is equal to 1, and 0 otherwise. Is that what you meant right?

But, and this is of course my limitation, in this way am I actually drawing from the probability phat_af? I am not sure if it is statistically the same

Thank you a lot in any case for your time.
D.
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2411
#4

01 Jan 2021, 14:07

Thanks for clarifying that your question involved numbers stored in a variable, not in a vector.

So, relying on: "create a new variable ...[with] 0 and 1 ... given by the probability of the variable phat_a."

Code:

gen byte y = (runiform() < phat_a)
1 like
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#5

01 Jan 2021, 14:18

in this way am I actually drawing from the probability phat_af? I am not sure if it is statistically the same

If we follow the code in post #4 to generate the variable y, then Pr{y=1} = Pr{runiform()<phat_af} = F(phat_af) where F is the cumulative distribution function of a uniform(0,1) random variable.

But F(x) = x defines the cumulative distribution function of a uniform(0,1) random variable, so F(phat_af) = phat_af.

Thus Pr{y=1} = phat_af, which is what you seek.

Last edited by William Lisowski; 01 Jan 2021, 14:46.
1 like
Comment
Demetrio Guzzardi

Join Date: Nov 2017

Posts: 42
#6

01 Jan 2021, 14:42

That's great.
Thank you Mike for this very neat code and thank you William for your explanations.

Have an happy new year!
Comment

Probability	iter1	iter2	iter3	iter4	iter5	iter6	iter7	iter8	iter9	iter10
0.5	1	1	0	1	0	0	1	1	0	1
1	1	1	1	1	1	1	1	1	1	1

Announcement