Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Simulating from an empirical distribution

    Dear all,

    I am trying to make simulations of realisations of a random variable associated with probabilities I found empirically.
    To be clear, suppose I have a random variable which can take values {1,2,3,4,5} with probabilities respectively {.2,.3,0,.1,.4}.
    What I am trying to do is to randomly select one entry of the vector (1,2,3,4,5) where each entry has probability to be chose (respectively) taken from the vector (.2,.3,0,.1,.4).
    How can I do?

    Thanks in advance for the help!
    Best
    Luca Gagliardone

  • #2
    Something like the following?

    Code:
    . set obs 10000000
    number of observations (_N) was 0, now 10,000,000
    
    .
    . /*
    > values {1,2,3,4,5} with probabilities respectively {.2,.3,0,.1,.4}.
    > */
    .
    . set seed 123456789
    
    .
    . ge double r = runiform()
    
    . su r
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
               r | 10,000,000     .500094    .2886607   7.23e-08   .9999999
    
    . ge v = .
    (10,000,000 missing values generated)
    
    . replace v = 1 if inrange(r, 0, .2)
    (1,998,376 real changes made)
    
    . replace v = 2 if inrange(r, `=.2 + 1e-9', .5)
    (3,000,913 real changes made)
    
    . replace v = 4 if inrange(r, `=.5 + 1e-9', .6)
    (999,015 real changes made)
    
    . replace v = 5 if inrange(r, `=.6 + 1e-9', 1)
    (4,001,696 real changes made)
    
    . ta v
    
              v |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              1 |  1,998,376       19.98       19.98
              2 |  3,000,913       30.01       49.99
              4 |    999,015        9.99       59.98
              5 |  4,001,696       40.02      100.00
    ------------+-----------------------------------
          Total | 10,000,000      100.00

    Comment


    • #3
      Dear Stephen,
      thanks for the help.

      You are definitely going in the right direction, but I am looking for a method to select one value of the random variable, in order to be able to make a "Monte-carlo"-style simulation using my empirically estimated distribution.

      For example, using the code above, the last step should be choosing one specific observation of the variable "v" selecting randomly: we will get then one value with approximately the right frequencies.
      Also, if I need to estimate the sample mean of my simulations (to approximate the sample mean of the random variable v) in a Monte-carlo fashion, I imagine it is incredibly inefficient to simulate thousands of variables "v"s with the approach above, choose randomly one realisation each and then average over them. There exists another way?

      Comment


      • #4
        To be more clear, the best possible result would be to have a function that works this way:

        1) Takes as inputs 3 things:
        a) a vector of realisations.
        b) a vector of probabilities.
        where the two vectors have the same length.
        c) an integer n.
        2) Gives back a vector of length n where each entry is:
        a) one and only one realisation from the vector of realisations.
        b) selected randomly using as a "weight" the vector of probabilities.

        Comment


        • #5
          Sorry, but your specification is confusing to me. Do you have a dataset full of observations or not, and you want to allocate values of v to each obs randomly? It's confusing (to me) for you to refer to getting "one value with approximately the right frequencies." One value? Per Obs? Right frequencies over what sample -- over all the replications?

          Whatever, the basic "trick" is likely to involve using the same machinery, i.e. based on comparisons of some number with values generated from a random uniform distribution.

          If you only have one obs in each replication, then I suspect that you need, within each iteration, to use the if prefix (not if qualifier) as above: if draw is in a particular range, value is set to something; if in another range, then to another number.

          But then, again, aren't the multiple rows in my artificial dataset just like a set of replications (one per row)?

          All in all, you can see that I am confused. You're going to have to spell things out in a much more detailed and basic fashion for people like me. (And I have to go off and do other stuff now ...)

          Comment


          • #6
            I apologise for being confusing.
            I will try to spell things out better.

            I have a a matrix of the form:
            5 7 9 10 20
            0.1 0 0.3 0.4 0.2
            where the first row are the values that the random variable can take and the second row are the associated probabilities.

            I am looking for a function that gives me ONE of the values among {5,7,9,10,20} randomly selected according to the associated probabilities.
            Hope this is clear now.

            Comment


            • #7
              http://www.maartenbuis.nl/publications/discrete.html
              ---------------------------------
              Maarten L. Buis
              University of Konstanz
              Department of history and sociology
              box 40
              78457 Konstanz
              Germany
              http://www.maartenbuis.nl
              ---------------------------------

              Comment


              • #8
                You can also use Mata function rdiscrete(r, c, p) that returns an r x c matrix containing random variates from the discrete distribution specified by the probabilities in the vector p of length k. The range of the discrete variates is 1, 2, ..., k.

                If you are working with matrices in Stata, you can use Mata function st_matrix() to obtain/put matrices from/into Stata.

                Using your example matrix (say M), here is code that hopefully produces a vector V that you want (n = 10 and k = 5).

                Code:
                mat M = (5,7,9,10,20\.2,.3,0,.1,.4)
                
                mata:
                M = st_matrix("M")
                R = M[1,.] // Realisations  in row 1
                P = M[2,.] // Probabilities in row 2
                n = 10
                D = rdiscrete(1,n,P)  // n discrete random variates from 1 to k
                V = R[1, D]           // V[i] = R[D[i]], 1 <= i <= n
                st_matrix("V", V)
                end
                
                mat list V
                I hope this helps.

                -- Kreshna

                Comment

                Working...
                X