Assigning random number while using pre-defined observation specific probabilities (Result of an LCA) - using rdiscrete

Jay Rege

Join Date: Sep 2014

Posts: 135
#1

Assigning random number while using pre-defined observation specific probabilities (Result of an LCA) - using rdiscrete

29 Oct 2021, 05:35

Hello Statalist,

As a result of an LCA I have three variables containing the probability of each observation to belong to a specific group (prob_1, prob_2, prob_3). The values of all three variables are by definition between 0 and 1.
To account for uncertainty in class assignment ( when using LCA as an independent variable) I want to run a simulation with a specific amount of iterations. In this simulation, the assignment to the class should be random while taking the probabilities (prob_1, prob_2, prob_3) into account.
The following code should demonstrate what I want to do (regarding the assignment of the class)

Code:

generate wanted = . mata: st_store(., "wanted", rdiscrete(2100, 1, ("0.5", "0.2", "0.3")))

I my case, probabilities vary across observations. Therefore I would like to substitute the general probabilities by individual probabilities, which are saved in the variables prob_1, prob_2 and prob_3

Code:

generate wanted = . mata: st_store(., "wanted", rdiscrete(2100, 1, ("prob_1", "prob_2", "prob_3")))

This however gives me the following error
rdiscrete(): 3253 nonreal found where real required
<istmt>: - function returned error
I guess the error message means that only numbers can be put where I wrote "prob_1" etc.

Therefore I thought of saving individual values in a local and than looping across all observations.
Something like this:

Code:

forvalues i = 1(1)2100 { local prob_1_local = prob_1[`i'] local prob_2_local = prob_2[`i'] local prob_3_local = prob_3[`i'] display `prob_1_local' display `prob_2_local' display `prob_3_local' mata: st_store(., "wanted", rdiscrete(1, 1, (`prob_1_local', `prob_2_local', `prob_3_local'))) }

This code gives me the error
sum of the probabilities must be 1
rdiscrete(): 3300 argument out of range
<istmt>: - function returned error
The error message is the reason why I added the display function to the code. The displayed values in my case are:
.3423453
.56572497
.09192975
Which almost perfectly add up to 1.

Can anyone help me with my approach? I'm also open to totally different approaches in case my whole approach was wrong.

Thanks

Jay
Tags: None
Mike Lacy

Join Date: Apr 2014

Posts: 2416
#2

29 Oct 2021, 09:55

My recollection is that Mata's -rdiscrete()- requires that the probability vector sum to 1 *in double precision.* I've dealt with this by adding whatever trivially tiny amount is necessary to one of the probability values to make sure that its double precision value is 1.0, i.e.

Code:

recast double prob_1 prob_2 prob_3 egen double check = rowtotal(prob1 prob_2 prob_3) replace prob_3 = prob_3 + 1 - check

This could also be done in Mata, after passing it the probability variables. I wonder also if you can generate the predicted class probs. as doubles and avoid this problem.

Stata really should (in my view) have an "rmultinomial()" function, like -rnormal()- etc.

Note, by the way, that you can sample from a multinomial without using Mata:

Code:

// Generic solution, with possibly more than three probs. // Cumulate probs. local k = 3 gen double F1 = prob_1 forval i = 2/`k' { local im1 = `i' -1 gen double F`i' = F`im1' + prob_`i' } replace F`k' = 1.0 if F`k' > 1 // handle any precision issue // gen rand = runiform() gen wanted = . forval i = 1/`k' { quiet replace wanted = `i' if (rand <= F`i') & missing(wanted) }
1 like
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#3

29 Oct 2021, 11:34

Originally posted by Mike Lacy View Post

My recollection is that Mata's -rdiscrete()- requires that the probability vector sum to 1 *in double precision.* I've dealt with this by adding whatever trivially tiny amount is necessary to one of the probability values to make sure that its double precision value is 1.0, i.e.

Code:

recast double prob_1 prob_2 prob_3 egen double check = rowtotal(prob1 prob_2 prob_3) replace prob_3 = prob_3 + 1 - check

This could also be done in Mata, after passing it the probability variables. I wonder also if you can generate the predicted class probs. as doubles and avoid this problem.

Stata really should (in my view) have an "rmultinomial()" function, like -rnormal()- etc.

...

To the first point, I suspect that many other software packages normalize the probabilities so that they all sum to one, i.e. sum the probabilities, then divide each probability by the total.

I think Mike may have responded to a very similar query I asked on the Mata forum by saying that if you are working outside of Mata, you could use the irecode function. That function takes the probabilities in cutpoint style, so the code might look something like

Code:

gen pseudoclass = irecode(runiform(), prob_1, prob_1 + prob_2, prob_1 + prob_2 + prob_3)

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2416
#4

30 Oct 2021, 08:52

Weiwen's idea to use -irecode()- is definitely the way to go here.
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#5

01 Nov 2021, 11:01

Originally posted by Mike Lacy View Post

Weiwen's idea to use -irecode()- is definitely the way to go here.

Just for the record, it was originally Mike’s idea.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Jay Rege

Join Date: Sep 2014

Posts: 135
#6

04 Nov 2021, 04:11

Hi Mike, Hi Weiwen,

thank you very much for the very helpful tips. Both suggested solutions have solved the problem!
Comment

Announcement

Assigning random number while using pre-defined observation specific probabilities (Result of an LCA) - using rdiscrete

Comment

Comment

Comment

Comment

Comment