Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • systematic/structured permutations

    a client came to me with a small data set (2 groups, A and B, with 5 people in A and 7 in B) and asked what could be done

    I suggested a permutation test; however, given the small N, I did not want to use -permute- which only does random permutations; with this small an N, going systematically through all possible permutations should be do-able

    I did not know how to do this and reached out to tech-support; they sent me some code (shown below) for a total N of 5 with group sizes of 2 and 3 (note the changed lines in the code below for N=12 and group size=5); while their code ran quickly with an N of 5, it has now been running for more than 15 hours with no result with my actual group and total N's; I did send my revised code to tech support when it had been running more than 3 hours and they did not seem worried (or even surprised); however, I think that 15 hours is ridiculous so - below is my code - can anyone see a problem or way to speed it up; note that completely different code that goes thru the permutations is fine; also, tech-support's logic was to do the full set of permutations first and then do the analysis; while that logic is fine, I recognize the possibility that a different logic (do analysis for each permutation, post the result and then go to next permutation) is also possible; I can't currently supply my actual data but here is some data using the auto data set:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int mpg byte foreign float id
    18 0  1
    18 0  2
    18 0  3
    19 0  4
    19 0  5
    19 0  6
    24 0  7
    17 1  8
    23 1  9
    25 1 10
    23 1 11
    35 1 12
    end
    label values foreign origin
    label def origin 0 "Domestic", modify
    label def origin 1 "Foreign", modify
    and here is the code I used (note that the last part shown on the screen (the do file is called "hpermute.do" and I typed "do hpermute") is the "while block"):
    Code:
    clear
    // an example of selecting 3 out of 5 for group 1
    local num 12 // change this to total N
    local sel_num= 5 // change this to group size for one of the groups
    set obs `num'
    
    // get the index for each permutation
    mata:
    // get the info of selecting # of #
    number =strtoreal(st_local("num"))
    number
    sel_number = strtoreal(st_local("sel_num"))
    ind = J(number, 1, .)
    
    // define a matrix to permute which contains 1, 2, ...
    for(i=1;i<number+1; i++) {
            ind[i,1] = i
    }
    
    // use -cvpermute()- to get all the permutation
    // only select first 3 of 5 which is for group 1
    info = cvpermutesetup(ind)
    V1=cvpermute(info)
    V_all = V1[1..sel_number,1]
    while((V1=cvpermute(info)) != J(0,1,.)) {
        V_all = V_all, sort(V1[1..sel_number,1], 1)
    
    }
    
    // drop duplicated indexes
    V_all=uniqrows(V_all')
    V_all
    // store the Mata matrix to Stata Matrix "index"
    st_matrix("index", V_all')
    end
    
    matlist index
    
    **********************************
    /*
    The matrix "index" stores the observation number for group 1.
    
    If you want to create data set with group assignment for each
    permutation, here is an example:
    */
    *************************
    
    clear
    local num 12 // change to actual number
    set obs `num'
    gen obs_num = _n
    save mydata, replace
    
    // match each permutation index to get each data
    forvalues i=1/`=colsof(index)' {
    
        clear
        matrix this = index[1..., `i']
        svmat this
        rename this1 obs_num
        save mythis, replace
        
        
        use mydata, clear
        merge 1:1 obs_num using mythis
        gen group`i'  = (_merge == 3)
        drop _merge
        save mydata, replace
    }
    in my real data, I am using a linear regression with "foreign" as the main predictor (and baseline measure of the outcome as the other predictor); I have not shown the analysis commands or the commands for posting the results as the program is not getting there (and they worked fine on the trial using N=5 with groups sizes of 2 and 3)

    added: as a check, I just entered the original code and ran it; it ran is 0.25 seconds
    Last edited by Rich Goldstein; 15 Sep 2018, 06:39.

  • #2
    added: forgot to mention that a non-Mata (Stata) solution is fine

    Comment


    • #3
      The program tstrest, which I wrote with Johannes Kaiser (st0158, Stata Journal, 9,1), is a utility for performing two-group permutation tests and might be useful to you. It's essentially an exact version of -permute-. I ran an example with a subset of the auto data: tsrtest foreign e(r2): regress price foreign
      and it took about 10 sec.

      Comment


      • #4
        thank you Mike Lacy

        thanks for this - works fine for t-test; I guess I could write a little program for a regression and return what I want from the regression?

        another possibility would be to "steal" some of your code? but the above sounds easier

        for anyone else following this, note that there is a typo in Mike's message; his program is called -tsrtest-

        added: writing a program to do a randomization test using Mike's program for a regression (I used the t-statistic; here equivalent to the p-value) was very easy; thanks again Mike Lacy

        added2: I note that Mike's program took about 2.5 seconds for a t-test and about 9 seconds for a regression on my machine; there were 792 permutations as you can check with
        Code:
        di comb(12,5)
        Last edited by Rich Goldstein; 15 Sep 2018, 09:49.

        Comment


        • #5
          Ouch, sorry about the typo. Yes, the idea of -tsrtest- is that just like -permute-, you can use as the test statistic any expression that is returned from a program. -tsrtest- is somewhat fussy about what expression it can handle. For example, it won't take as the test statistic an expression like _b[MyVar] or el(r(table),1,3) It wants something simple like r(t) or e(r2). So, the way to use -regress-, if you want to access a particular parameter estimate, is to write a little wrapper:
          Code:
          // Make a tractable sized problem.
          clear
          sysuse auto
          sample 8, by(foreign) count 
          //
          cap prog drop wrapper
          prog wrapper, rclass
          regress price foreign turn
          return scalar bturn = _b[turn]
          end
          //
          tsrtest foreign r(bturn), exact: wrapper
          -tsrtest- will revert to a resampling approach, like -permute-, if it estimates that the problem would take too long ( "too long" by default is 1000 sec). The exact option forces it to generate and analyze all possible assignments of the data to the two groups even if the heat death of the sun would occur before it finished. <grin>








          Comment


          • #6
            again, thank you very much Mike Lacy for bringing this to my attention (and thanks for the sample program in #5)

            Comment


            • #7
              Mike Lacy
              first, I don't see a way to save the distribution of the permuted item(s) using -tsrtest- ; did I miss it?
              second, if there currently is no way to do that, I would like to see it and, if you can give me some guidance about where in your program is the most likely place to add such a posting of results, I would be glad to (attempt to) add it if that is ok with you

              Comment


              • #8
                Rich Goldstein: I'll communicate to you offline about this, as the answer is way down in the weeds. Short story is that I think there is a solution and if you can put something together, that would be great.

                Comment


                • #9
                  @Mike Lacy: actually, it appears that the "using" filename part works, both with and without the "overwrite" option; not sure how I missed it in the help file - thank you; I will continue to test on real data over the next couple of weeks but, to repeat, it seems to work providing a .dta file with number of combinations/replications plus 1 for the actual data where the actual data result is the first observation in the new data file

                  Comment


                  • #10
                    Rich Goldstein, Well, I find it seems to be fussy about filenames containing directories, although this is likely OS dependent (I'm using Windows 7). For me:
                    Code:
                    tsrtest foreign r(t) using junk.dta: ttest price, by(foreign)
                    works fine as intended, but
                    Code:
                    tsrtest foreign r(t) using c:/temp/junk.dta: ttest price, by(foreign)
                    // gives the error message
                    [CODE]could not successfully execute '/temp/junk.dta: ttest price, by(foreign)', aborting

                    Do you get a similar problem?

                    Comment


                    • #11
                      Mike Lacy, I can't check this weekend but I will check next week

                      I admit that this desire had never occurred to me as I have separate directories for each of my projects and I don't cross directories in the way you tried;

                      I use a Mac with up-to-date OS

                      Comment


                      • #12
                        @Mike Lacy: first, apologies for being so long; second, I did NOT get a similar problem; I sysuse the auto file while not in the Downloads directory and then
                        Code:
                        tsrtest foreign r(t) using /Volumes/Mac1/Users/rich/Downloads/junk.dta: ttest price, by(foreign)
                        worked fine and save junk.dta in the Downloads directory

                        I am using a MacPro with a fully version 10.13.6 of the OS

                        Comment


                        • #13
                          Aha! Your example made the light bulb go on. It's the ":" contained in the full Windows filespec that is causing a problem. The program has some code to parse on ":" to find the user's statistical command, but gets fooled by the colon in the filespec. Should not be a problem in the Mac or *nix world. Perhaps I can figure out how -permute- or -bootstrap- handles the same parsing issue and fix -tsrtest- Thanks for posting this.

                          Comment


                          • #14
                            you're welcome - and thanks for writing the software!

                            Comment


                            • #15
                              Mike Lacy, please receive my appreciation and admire for your wonderful -tsrtest-, which seems to be the perfect solution for Rich's issue.

                              In following this discussion, I just contribute a code (with basic Stata commands) which merely focuses on creating the complete enumeration of all possible arrangements (i.e selecting m from k members, m<k). This code might be an alternative (without any problem of time-consuming) for the code mentioned by Rich in #1.
                              Code:
                              clear
                              local k = 20
                              local m = 5
                              
                              set obs 1
                              gen r0 = 0
                              gen ex=.
                              local sofar "r0"
                              
                              forval i = 1/`m' {
                              replace ex = `k' -`m' +`i' - r`=`i'-1'
                              expand ex
                              bys `sofar': gen r`i' = r`=`i'-1' +_n
                              local sofar = "`sofar'" + " r`i'"
                              }
                              drop ex r0

                              Comment

                              Working...
                              X