Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Paul: I see the description of R's multinom() function at https://stat.ethz.ch/R-manual/R-patc.../Multinom.html. If you can "see" the underlying R code used to create this function (I've no idea how one does this -- I'm not an R user), then I am very confident that the algorithm could be used to create the same functionality using Mata. I agree that it'd be good to have this function built-in to Stata/Mata.

    Comment


    • #17
      Thanks, Stephen. My understanding is that Mata's rdiscrete() is equivalent to R's rmultinom().

      Comment


      • #18
        It turns out there is a simple solution to the question I originally posed. Just use the command: gsample, gen()

        Comment


        • #19
          The following code defines a program –bsampw– to perform frequency-weighted bootstrapping.

          The context is that I often deal with large samples for which I use –contract– or –gcontract– to create smaller samples in which the frequency weights generated by –contract– allow one to recover the original sample. I would sometimes find it useful to be able to use the "contracted" sample for drawing bootstrap samples and then using these samples in subsequent frequency-weighted commands whose results would be accumulated across the bootstrap replications.

          Unfortunately Stata's built-in –bsample– command does not allow frequency weights, nor does Ben Jann's user-contributed –gsample– (gsample allows aweights and iweights).

          –bsampw– is designed specifically for fweights, with each resampling creating a new frequency weight variable in the sample whose sum is the same as the sum of the original frequency-weight variable (the sample is otherwise unchanged). A few results suggest that it's about 3-4 times slower than –gsample– (where the frequency weight is specified as an iweight) but about 3-4 times faster than using –bsample– on the original ("un-contracted") sample. (These timings are based on an original sample of about 1.7M and a contracted sample of about 5k.)

          I fear I may have reinvented the wheel with –bsampw– but I couldn't find any other sources on doing this in Stata. So it's offered for what it's worth. If any of you might have ideas about how to speed it up they'd be appreciated, keeping in mind that the objective is to define for each resampling a new frequency variable whose sum must equal the sum of the original frequency weight.
          Code:
          /******************************************************************************/
          /*                                                                            */
          /* bsampw: program to draw a weighted bootstrap sample using nonnegative      */
          /*         integer frequency weights                                          */
          /*                                                                            */
          /* Program adds to the data in memory a new variable _fnew that contains the  */
          /*  new nonnegative (bootstrapped) frequency weights. The original frequency  */
          /*  variable is retained. If a variable named _fnew already exists in the     */
          /*  sample it is replaced. The sum of _fnew equals the sum of the original    */
          /*  frequency variable.                                                       */
          /*                                                                            */
          /* Examples:  bsampw frq                                                      */
          /*            bsampw _freq                                                    */
          /*                                                                            */
          /*  where frq is the nonnegative frequency variable in the current sample.    */
          /*                                                                            */
          /* The program requires the Mata functions wboot() and rmultinomial() which   */
          /*  are defined at the top of this do file. wboot(x,w) takes two arguments:   */
          /*  x is an Nxk matrix of data to be bootstrapped; w is an Nx1 vector of      */
          /*  nonnegative frequency weights. wboot(x,w) returns the Nx(k+1) matrix      */
          /*  [x,f] where f is an Nx1 vector of bootstrapped frequencies based on w.    */
          /*  Note that wboot() may be independently useful for conducting bootstrap    */
          /*  exercises within Mata.                                                    */
          /*                                                                            */
          /* Note: Due to internal limits on the Mata function rdiscrete() the number   */
          /*  of observations in the dataset to be bootstrapped must be <=10,000.       */
          /*  Moreover, the program will be slower the larger is the sum of the         */
          /*  frequency weights.                                                        */
          /*                                                                            */
          /******************************************************************************/
          
          cap prog drop bsampw
          cap mata mata drop wboot()
          cap mata mata drop rmultinomial()
          
          /* Define Mata functions rmultinomial() and wboot()                           */
          
          mata
          
          function rmultinomial(p,|n) {
           real ur,psum,pnorm,ctch
           if (args()==1) {
            psum=sum(p)
            pnorm=p:/psum
            ur=uniqrows(rdiscrete(psum,1,pnorm),1)
           }
           if (args()==2) {
            pnorm=p
            ur=uniqrows(rdiscrete(n,1,pnorm),1)
           }
           ctch=J(size(p),1,0)
           ctch[ur[.,1]]=ur[.,2]
           return(ctch)
          }
          
          function wboot(x,w) return((x,rmultinomial(w)))
          
          end
          
          
          /* Define Stata program bsampw                                                */
          
          prog def bsampw
          
          cap drop _fnew
          mata _tboot=wboot(_ft=st_data(.,"`1'"),_ft)
          mata _av=st_addvar(st_vartype("`1'"),"_fnew")
          mata st_store(.,"_fnew",_tboot[.,cols(_tboot)])
          mata mata drop _ft _tboot _av
          
          end
          Last edited by John Mullahy; 25 Apr 2021, 10:54. Reason: typo

          Comment

          Working...
          X