Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Assigning random number while using pre-defined observation specific probabilities (Result of an LCA) - using rdiscrete

    Hello Statalist,

    As a result of an LCA I have three variables containing the probability of each observation to belong to a specific group (prob_1, prob_2, prob_3). The values of all three variables are by definition between 0 and 1.
    To account for uncertainty in class assignment ( when using LCA as an independent variable) I want to run a simulation with a specific amount of iterations. In this simulation, the assignment to the class should be random while taking the probabilities (prob_1, prob_2, prob_3) into account.
    The following code should demonstrate what I want to do (regarding the assignment of the class)
    Code:
    generate wanted = . 
    mata: st_store(., "wanted", rdiscrete(2100, 1, ("0.5", "0.2", "0.3")))
    I my case, probabilities vary across observations. Therefore I would like to substitute the general probabilities by individual probabilities, which are saved in the variables prob_1, prob_2 and prob_3
    Code:
    generate wanted = . 
    mata: st_store(., "wanted", rdiscrete(2100, 1, ("prob_1", "prob_2", "prob_3")))
    This however gives me the following error
    rdiscrete(): 3253 nonreal found where real required
    <istmt>: - function returned error
    I guess the error message means that only numbers can be put where I wrote "prob_1" etc.

    Therefore I thought of saving individual values in a local and than looping across all observations.
    Something like this:
    Code:
    forvalues i = 1(1)2100 {
    local prob_1_local = prob_1[`i']
    local prob_2_local = prob_2[`i']
    local prob_3_local = prob_3[`i']
    display `prob_1_local'
    display `prob_2_local'
    display `prob_3_local'
    
    mata: st_store(., "wanted", rdiscrete(1, 1, (`prob_1_local', `prob_2_local', `prob_3_local')))
    }
    This code gives me the error
    sum of the probabilities must be 1
    rdiscrete(): 3300 argument out of range
    <istmt>: - function returned error
    The error message is the reason why I added the display function to the code. The displayed values in my case are:
    .3423453
    .56572497
    .09192975
    Which almost perfectly add up to 1.

    Can anyone help me with my approach? I'm also open to totally different approaches in case my whole approach was wrong.

    Thanks

    Jay

  • #2
    My recollection is that Mata's -rdiscrete()- requires that the probability vector sum to 1 *in double precision.* I've dealt with this by adding whatever trivially tiny amount is necessary to one of the probability values to make sure that its double precision value is 1.0, i.e.
    Code:
    recast double prob_1 prob_2 prob_3
    egen double check = rowtotal(prob1 prob_2 prob_3)
    replace prob_3 = prob_3 + 1 - check
    This could also be done in Mata, after passing it the probability variables. I wonder also if you can generate the predicted class probs. as doubles and avoid this problem.

    Stata really should (in my view) have an "rmultinomial()" function, like -rnormal()- etc.

    Note, by the way, that you can sample from a multinomial without using Mata:


    Code:
    // Generic solution, with possibly more than three probs.
    // Cumulate probs.
    local k = 3
    gen double F1 = prob_1
    forval i = 2/`k' {
       local im1 = `i' -1
        gen double F`i' = F`im1' + prob_`i'
    }    
    replace F`k' = 1.0 if F`k' > 1 // handle any precision issue
    //
    gen rand = runiform()
    gen wanted = .
    forval i = 1/`k' {
      quiet replace wanted = `i' if (rand <= F`i')  & missing(wanted)
    }

    Comment


    • #3
      Originally posted by Mike Lacy View Post
      My recollection is that Mata's -rdiscrete()- requires that the probability vector sum to 1 *in double precision.* I've dealt with this by adding whatever trivially tiny amount is necessary to one of the probability values to make sure that its double precision value is 1.0, i.e.
      Code:
      recast double prob_1 prob_2 prob_3
      egen double check = rowtotal(prob1 prob_2 prob_3)
      replace prob_3 = prob_3 + 1 - check
      This could also be done in Mata, after passing it the probability variables. I wonder also if you can generate the predicted class probs. as doubles and avoid this problem.

      Stata really should (in my view) have an "rmultinomial()" function, like -rnormal()- etc.

      ...
      To the first point, I suspect that many other software packages normalize the probabilities so that they all sum to one, i.e. sum the probabilities, then divide each probability by the total.

      I think Mike may have responded to a very similar query I asked on the Mata forum by saying that if you are working outside of Mata, you could use the irecode function. That function takes the probabilities in cutpoint style, so the code might look something like

      Code:
      gen pseudoclass = irecode(runiform(), prob_1, prob_1 + prob_2, prob_1 + prob_2 + prob_3)
      Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

      When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

      Comment


      • #4
        Weiwen's idea to use -irecode()- is definitely the way to go here.

        Comment


        • #5
          Originally posted by Mike Lacy View Post
          Weiwen's idea to use -irecode()- is definitely the way to go here.
          Just for the record, it was originally Mike’s idea.
          Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

          When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

          Comment


          • #6
            Hi Mike, Hi Weiwen,

            thank you very much for the very helpful tips. Both suggested solutions have solved the problem!

            Comment

            Working...
            X