Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Error with function that uses -rdiscrete()-

    I have a small program to generate multinomial random variables in Stata, given an existing Stata variable name and a probability vector. It's a wrapper for a Mata function that uses Mata's -rdiscrete()-. The problem I'm having is that -rdiscrete()- occasionally will not execute with certain probability vectors.

    -rdiscrete()- understandably requires a probability vector that sums *very* closely to 1.0, a problem my function adapts to by checking for this problem and always making a tiny adjustment to the first element of the probability vector. After many years of using this function without incident, I encountered a problem while trying to generate a uniform multinomial variable with 7 categories.

    The surprising thing is that -rdiscrete()- worked with a 7-element probability vector that was a *little* bit off from summing to 1.0, but *failed* when I gave it an input vector for 7 categories that was *closer* to summing to 1.0! This stumps me, though my guess goes to some kind of precision issue. Do any of you have an idea of what the problem might be?

    Code for the Mata function, including some debugging echos and some relevant examples appear below.


    Code:
    cap mata mata drop multi()
    mata:
    void multi(string varname, string okname, real scalar N, real colvector prob)
    // Generate multinomial random variable in Stata.
    {
       "Input prob vector:"  
       prob
       // If sum of prob. vector differs from 1.0 by too much, it's an error.
       diff = sum(prob) - 1.0
       if (abs(diff) > 1e-6) {
         st_local(okname, "0")
         exit
       }
       // User-given prob. vectors often differ a little from 1.0 due to precision issues.
       // Avoid this by *always* trivially adjusting the first element in the vector.
       "diff before adjustment: " + strofreal(diff, "%20.5g")
       prob[1] = prob[1] - diff
       diff = sum(prob) - 1.0
       "diff after adjustment: " + strofreal(diff, "%20.5g")
       x = rdiscrete(N, 1, prob)
       st_store(., varname,  x)      
       st_local(okname, "1")
    }
    end
    //
    // Examples, good and bad.
    clear
    local N = 100
    set obs `N'
    //
    gen byte y3 = .
    // Works fine.
    mata: multi("y3", "OKflag", `N', ///
       (.33333333\.33333333\.33333333) )
    tab y3   
    //
    // Works fine: prob. vector with values reasonably close to 1/7
    gen byte y7 = .
    mata: multi("y7", "OKflag", `N', ///
       (.14285714\.14285714\.14285714\.14285714\ ///
        .14285714\.14285714\.14285714) )
    tab y7    
    //    
    // Values *closer* to 1/7 fail.    
    gen byte y7closer = .
    mata: multi("y7closer", "OKflag", `N', ///
    (.1428571428571\.1428571428571\.1428571428571\.1428571428571\  ///
    .1428571428571\.1428571428571\.1428571428571) )

  • #2
    Hmm....I thought at first that if you were to rewrite each addition, subtraction and sum in terms of quadsum() then that might fix the precision error. However, that failed in a similar way. As a workaround, I reframed the function to use runiform() instead.

    Code:
    clear *
    cls
    
    set seed 523
    
    set matastrict on
    
    cap mata mata drop multi()
    mata:
      void multi(string varname, string okname, real scalar N, real colvector prob) {
        real colvector  cumprob,  u,   x
        real scalar     diff,     i
        
        "Input prob vector:"
        prob
        
        // If sum of prob. vector differs from 1.0 by too much, it's an error.
        diff = sum(prob) - 1.0
        if (abs(diff) > 1e-6) {
          st_local(okname, "0")
          exit()
        }
    
        cumprob = J(length(prob), 1, .)
        for (i = 1; i < length(prob); i++) {
          cumprob[i] = sum(prob[1..i])
        }
        cumprob[length(prob)] = 1
    
        u = runiform(N, 1)
        
        x = J(N, 1, 1)
        for (i = 1; i < length(prob); i++) {
          x = x :+ (u :> cumprob[i])
        }
        
        st_store(., varname, x)
        st_local(okname, "1")
      }
    end
    
    local Nobs 1000
    set obs `Nobs'
    
    gen byte y1 = .
    mata: multi("y1", "okflag", `Nobs', (.1, .2, .3, .4)')
    tab y1
    
    gen byte y2 = .
    mata: multi("y2", "okflag", `Nobs', (.1, .1, .1, .7)')
    tab y2
    
    gen byte y3 = .
    mata: multi("y3", "okflag", `Nobs', J(7, 1, 1/7))
    tab y3
    
    gen byte y4 = .
    mata: multi("y4", "okflag", `Nobs', J(9, 1, 1/9))
    tab y4

    Comment


    • #3
      Thanks to Leonardo Guizzetti for joining in on my question. Yes, this kind of solution, cumulating the prob. vector and using a uniform random number, is a good and standard one, which I've also coded in Stata without recourse to Mata, and in fact the relatively obscure built-in -irecode- command is helpful for this purpose.

      However, what I'm finding here almost looks like a bug in -rdiscrete()-, and I'm curious to try to diagnose it.

      Comment


      • #4
        Thanks for pointing out -irecode-, I hadn't noticed that before. I'm sure you were already aware of the approach, but thought I'd offer a workaround. This might be a bug, but I can't be sure. Adding more or less digits in your final example seem to work, but this middle ground you have found does not. Consider also the repeating fraction (in binary and base-10) of 1/9. Inputting 0.111111111111 (diff before adjustment: -9.9987e-13), but more or less digits works as expected.

        Comment


        • #5
          Hmm, that's interesting and strange.

          I did confirm, I think, that the problem with -rdiscrete()- here is *not* due to the probabilities not summing to 1.0. Here's a simpler minimal example than what I had above that shows this, along with the error messages that occur:

          Code:
          mata:
          prob = (.1428571428571\.1428571428571\.1428571428571\.1428571428571\  ///
                 .1428571428571\.1428571428571\.1428571428571)
          diff = sum(prob)-1.0
          "diff = "; diff
          rdiscrete(100, 1, prob)
          end
          This results in the following:
          Code:
           diff =
            -3.00204e-13
          
          : rdiscrete(100, 1, prob)
          sum of the probabilities must be 1
                       rdiscrete():  3300  argument out of range
                           <istmt>:     -  function returned error
          (0 lines skipped)
          I go in and tweak the probability vector and retry

          Code:
          mata:
          prob[1] = prob[1] - diff
          diff = sum(prob) -1.0
          "diff = "; diff
          rdiscrete(100, 1, prob)
          end
          which gives the following output and error:
          :
          Code:
            diff =
            0
          
          : rdiscrete(100, 1, prob)
                       rdiscrete():  3498  Stata returned error
                           <istmt>:     -  function returned error
          (0 lines skipped)
          ---------------------------------------------------------------------------------------------------------------------------------------------------------
          r(3498);
          The gloss for r(3498) is quite generic: "An error specific to this function arose. The text of the message should describe the problem."

          This becomes murkier and murkier, at least to me <grin>.

          I'm now thinking to report this a possible bug. Does that seem reasonable?

          I should mention that I'm using Stata 15.1. I presume but don't know that this problem persists in version 16.

          Comment


          • #6
            I can confirm the behavior in Stata 16.1. The last error indeed looks very much like a bug. As a minimum, the function should return an informative error message if it does not like the input.
            https://twitter.com/Kripfganz

            Comment


            • #7
              I've sent this issue to Tech Support, and will report back when I hear something.

              Comment


              • #8
                Hi all,

                This is a bug and we are currently looking into a fix for this that we will release in a future update. Thank you Mike Lacy for the find, and we apologize for the inconvenience!

                Best,
                Joerg

                Comment

                Working...
                X