Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Coding a discrete random number generator with prior probabilities

    Folks,

    I'm new to Mata programming. I'm trying to create a tool to randomly generate a number from a discrete distribution. In this observation, each respondent may belong to one of 3 predicted classes, with posterior probabilities shown in the variables c1-c3. I'm trying to generate a random draw for the class that the person belongs to.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(id c1 c2 c3)
     1          .             .             .
     2  .08422642      .9157736  8.121268e-24
     3   .5383863             0      .4616137
     4   .9936078             0    .006392221
     5          1  6.864455e-36  7.544733e-12
     6  .07823695             0      .9217631
     7   .9886059             0    .011394108
     8   .7088358             0      .2911642
     9   .9999642             0 .000035737274
    10  .02026204             0      .9797379
    11   .0742506             0      .9257494
    12  .43651125             0      .5634887
    13          1  2.952504e-08 4.8909324e-16
    14  .06493116             0      .9350688
    15  .02226799       .977732 2.2346245e-27
    16   .9999998             0 2.4839485e-07
    17 .021456486             0      .9785435
    18   .9999994             0  5.940184e-07
    19   .9999972             0  2.830447e-06
    20  .08522644             0      .9147736
    21   .0318002             0      .9681998
    22   .3983888             0      .6016112
    23          1   9.11082e-37  5.163497e-10
    24   .9969217             0    .003078261
    25  .06136134             0      .9386387
    26  .06014966             0      .9398503
    27          1 2.4505809e-35 2.0005193e-12
    28  .27831626      .7216837   1.99915e-22
    29  .06772947             0      .9322705
    30    .910338             0       .089662
    31  .10535698             0       .894643
    32          1             0  4.133972e-10
    33   .9999999             0 1.0021545e-07
    34   .9999825             0 .000017450597
    35   .9999714             0 .000028616907
    36   .9999644             0    .000035663
    37  .08730433      .9126956  1.004739e-23
    38   .9999963             0  3.703624e-06
    39  .08522644             0      .9147736
    40   .0318002             0      .9681998
    41  .21588437             0      .7841156
    42  .02749338             0      .9725066
    43   .9355406             0     .06445945
    44  .11154781      .8884522  3.181763e-23
    45          1             0  8.764027e-09
    46          1             0 1.9692013e-10
    47   .9998856             0  .00011444603
    48   .9945732             0    .005426852
    49          1  1.401862e-24 1.7184303e-12
    50   .9063449             0     .09365512
    end
    Here's my Mata code:

    Code:
    mata:
    mata clear
    void function draw() {
        gammaM = st_data(., ("c1","c2","c3"))
        gammaM = gammaM :/ rowsum(gammaM)
        n = rows(gammaM)
        guessclass = J(n,1,.)
        for(i = 1; i<= n; i++) {
            guesspr = gammaM[`i',]
            if (guesspr[1,1] = .) {
                guessclass[`i',1] = .
            }
            else {
                guessclass[`i',1] = rdiscrete(1,1,guesspr)
            }
        }
    }
    end
    I'm aware that the code doesn't generalize, or that it doesn't even output the random draws to Stata, but I want to work on that later. This code produces a conformability error (3200) when I try to run it. However, when I enter mata and try to generate the random draw for an observation whose predicted probabilities are not missing, the code works.

    Note: some observations do have missing values for predicted probability (e.g. the first one in this sample). I'm trying to set those observations' random draw to missing. I suspect this is where the code is breaking. Also note, I'm aware that the vector of probabilities supplied to -rdiscrete- must sum to 1, and I've already ensured that by normalizing the probabilities (see the 2nd line of my function).


    If you are interested: the context is trying to develop some validation tools for latent class models. The 3 probabilities are posterior probabilities of class membership from a 3-class model. I would like to randomly generate one random class assignment for repeated use as a start value (option startvalues(randomid guessid)) to verify that I've hit a global maximum likelihood. You can feed in the 3 posterior probabilities as start values, but you will simply replicate your solution many times; I want to do the equivalent of jittering the start values slightly.
    Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

    When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

  • #2
    Hi Weiwen,

    I found a couple of errors:
    1) You need to use "==" in if (guesspr[1,1] = .)
    2) You have used Stata's dereferencing notation of `i' in several places in Mata. You weren't intending to dereference Stata locals in Mata, so you don't want `i'. You were just using a variable called "i" in Mata. So, for example, guessclass[`i',1] = . should be guessclass[i,1] = . There are situations in which one might want to pass and dereference a Stata local into Mata, but this is not one of them.

    Suggestions:
    1) You could use the Mata function hasmissing(guesspr) to detect any missing values *anywhere* in a row.
    2) If what you eventually want is a multinomial random variable in Stata, you can do this purely in Stata, and doing that might even be faster than using rdiscrete in Mata. The following (perhaps imperfect) is adapted from some code I used in a somewhat similar situation, and found it was faster than rdiscrete.

    Code:
    cap prog drop rmulti
    prog rmulti
    syntax, x(name) prob(varlist)
    confirm new variable `x'
    //
    unab prob: `prob'
    local k: word count `prob'
    // cumulate probs
    local p1 = word("`prob'", 1)
    gen F1 = `p1'
    forval i= 2/`k' {
       local im1 = `i' -1
       local curr: word `i' of `prob'
       gen F`i' = F`im1' + `curr'
    }
    tempvar u
    qui gen `x' = .
    gen `u' = runiform()
    forval i = 1/`k' {  
       // Note that if any of the prob varlist is missing, F`k' will be missing
       qui replace `x'  = `i' ///
          if !missing(F`k') & (`u' <= F`i' ) & missing(`x')
    }
    end
    //


    Comment


    • #3
      Originally posted by Mike Lacy View Post
      Hi Weiwen,

      I found a couple of errors:
      1) You need to use "==" in if (guesspr[1,1] = .)
      2) You have used Stata's dereferencing notation of `i' in several places in Mata. You weren't intending to dereference Stata locals in Mata, so you don't want `i'. You were just using a variable called "i" in Mata. So, for example, guessclass[`i',1] = . should be guessclass[i,1] = . There are situations in which one might want to pass and dereference a Stata local into Mata, but this is not one of them.

      Suggestions:
      1) You could use the Mata function hasmissing(guesspr) to detect any missing values *anywhere* in a row.
      2) If what you eventually want is a multinomial random variable in Stata, you can do this purely in Stata, and doing that might even be faster than using rdiscrete in Mata. The following (perhaps imperfect) is adapted from some code I used in a somewhat similar situation, and found it was faster than rdiscrete.

      Code:
      cap prog drop rmulti
      prog rmulti
      syntax, x(name) prob(varlist)
      confirm new variable `x'
      //
      unab prob: `prob'
      local k: word count `prob'
      // cumulate probs
      local p1 = word("`prob'", 1)
      gen F1 = `p1'
      forval i= 2/`k' {
      local im1 = `i' -1
      local curr: word `i' of `prob'
      gen F`i' = F`im1' + `curr'
      }
      tempvar u
      qui gen `x' = .
      gen `u' = runiform()
      forval i = 1/`k' {
      // Note that if any of the prob varlist is missing, F`k' will be missing
      qui replace `x' = `i' ///
      if !missing(F`k') & (`u' <= F`i' ) & missing(`x')
      }
      end
      //

      Mike, thanks for the tips. You're absolutely right, all I need for this exercise is a multinomial random number in Stata. Should have thought of that solution!
      Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

      When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

      Comment


      • #4
        (My followup continues to be a Stata rather than Mata point, but it's in context here.)

        I just noticed the Stata function irecode(), which gives a one-line Stata approach to generating a multinomial that is much easier to use and faster than what I suggested.
        Code:
        gen X = 1 + irecode(runiform(), c1, c1+c2, 1.0)

        Comment

        Working...
        X