systematic/structured permutations

Rich Goldstein

Join Date: Mar 2014

Posts: 4458
#1

systematic/structured permutations

15 Sep 2018, 05:42

a client came to me with a small data set (2 groups, A and B, with 5 people in A and 7 in B) and asked what could be done

I suggested a permutation test; however, given the small N, I did not want to use -permute- which only does random permutations; with this small an N, going systematically through all possible permutations should be do-able

I did not know how to do this and reached out to tech-support; they sent me some code (shown below) for a total N of 5 with group sizes of 2 and 3 (note the changed lines in the code below for N=12 and group size=5); while their code ran quickly with an N of 5, it has now been running for more than 15 hours with no result with my actual group and total N's; I did send my revised code to tech support when it had been running more than 3 hours and they did not seem worried (or even surprised); however, I think that 15 hours is ridiculous so - below is my code - can anyone see a problem or way to speed it up; note that completely different code that goes thru the permutations is fine; also, tech-support's logic was to do the full set of permutations first and then do the analysis; while that logic is fine, I recognize the possibility that a different logic (do analysis for each permutation, post the result and then go to next permutation) is also possible; I can't currently supply my actual data but here is some data using the auto data set:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input int mpg byte foreign float id 18 0 1 18 0 2 18 0 3 19 0 4 19 0 5 19 0 6 24 0 7 17 1 8 23 1 9 25 1 10 23 1 11 35 1 12 end label values foreign origin label def origin 0 "Domestic", modify label def origin 1 "Foreign", modify

and here is the code I used (note that the last part shown on the screen (the do file is called "hpermute.do" and I typed "do hpermute") is the "while block"):

Code:

clear // an example of selecting 3 out of 5 for group 1 local num 12 // change this to total N local sel_num= 5 // change this to group size for one of the groups set obs `num' // get the index for each permutation mata: // get the info of selecting # of # number =strtoreal(st_local("num")) number sel_number = strtoreal(st_local("sel_num")) ind = J(number, 1, .) // define a matrix to permute which contains 1, 2, ... for(i=1;i<number+1; i++) { ind[i,1] = i } // use -cvpermute()- to get all the permutation // only select first 3 of 5 which is for group 1 info = cvpermutesetup(ind) V1=cvpermute(info) V_all = V1[1..sel_number,1] while((V1=cvpermute(info)) != J(0,1,.)) { V_all = V_all, sort(V1[1..sel_number,1], 1) } // drop duplicated indexes V_all=uniqrows(V_all') V_all // store the Mata matrix to Stata Matrix "index" st_matrix("index", V_all') end matlist index ********************************** /* The matrix "index" stores the observation number for group 1. If you want to create data set with group assignment for each permutation, here is an example: */ ************************* clear local num 12 // change to actual number set obs `num' gen obs_num = _n save mydata, replace // match each permutation index to get each data forvalues i=1/`=colsof(index)' { clear matrix this = index[1..., `i'] svmat this rename this1 obs_num save mythis, replace use mydata, clear merge 1:1 obs_num using mythis gen group`i' = (_merge == 3) drop _merge save mydata, replace }

in my real data, I am using a linear regression with "foreign" as the main predictor (and baseline measure of the outcome as the other predictor); I have not shown the analysis commands or the commands for posting the results as the program is not getting there (and they worked fine on the trial using N=5 with groups sizes of 2 and 3)

added: as a check, I just entered the original code and ran it; it ran is 0.25 seconds

Last edited by Rich Goldstein; 15 Sep 2018, 06:39.
Tags: None
Rich Goldstein

Join Date: Mar 2014

Posts: 4458
#2

15 Sep 2018, 06:52

added: forgot to mention that a non-Mata (Stata) solution is fine
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2411
#3

15 Sep 2018, 08:16

The program tstrest, which I wrote with Johannes Kaiser (st0158, Stata Journal, 9,1), is a utility for performing two-group permutation tests and might be useful to you. It's essentially an exact version of -permute-. I ran an example with a subset of the auto data: tsrtest foreign e(r2): regress price foreign
and it took about 10 sec.
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4458
#4

15 Sep 2018, 08:58

thank you Mike Lacy

thanks for this - works fine for t-test; I guess I could write a little program for a regression and return what I want from the regression?

another possibility would be to "steal" some of your code? but the above sounds easier

for anyone else following this, note that there is a typo in Mike's message; his program is called -tsrtest-

added: writing a program to do a randomization test using Mike's program for a regression (I used the t-statistic; here equivalent to the p-value) was very easy; thanks again Mike Lacy

added2: I note that Mike's program took about 2.5 seconds for a t-test and about 9 seconds for a regression on my machine; there were 792 permutations as you can check with

Code:

di comb(12,5)

Last edited by Rich Goldstein; 15 Sep 2018, 09:49.
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2411
#5

15 Sep 2018, 10:38

Ouch, sorry about the typo. Yes, the idea of -tsrtest- is that just like -permute-, you can use as the test statistic any expression that is returned from a program. -tsrtest- is somewhat fussy about what expression it can handle. For example, it won't take as the test statistic an expression like _b[MyVar] or el(r(table),1,3) It wants something simple like r(t) or e(r2). So, the way to use -regress-, if you want to access a particular parameter estimate, is to write a little wrapper:

Code:

// Make a tractable sized problem. clear sysuse auto sample 8, by(foreign) count // cap prog drop wrapper prog wrapper, rclass regress price foreign turn return scalar bturn = _b[turn] end // tsrtest foreign r(bturn), exact: wrapper

-tsrtest- will revert to a resampling approach, like -permute-, if it estimates that the problem would take too long ( "too long" by default is 1000 sec). The exact option forces it to generate and analyze all possible assignments of the data to the two groups even if the heat death of the sun would occur before it finished. <grin>
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4458
#6

15 Sep 2018, 11:05

again, thank you very much Mike Lacy for bringing this to my attention (and thanks for the sample program in #5)
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4458
#7

18 Sep 2018, 05:52

Mike Lacy
first, I don't see a way to save the distribution of the permuted item(s) using -tsrtest- ; did I miss it?
second, if there currently is no way to do that, I would like to see it and, if you can give me some guidance about where in your program is the most likely place to add such a posting of results, I would be glad to (attempt to) add it if that is ok with you
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2411
#8

18 Sep 2018, 08:12

Rich Goldstein: I'll communicate to you offline about this, as the answer is way down in the weeds. Short story is that I think there is a solution and if you can put something together, that would be great.
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4458
#9

20 Sep 2018, 14:34

@Mike Lacy: actually, it appears that the "using" filename part works, both with and without the "overwrite" option; not sure how I missed it in the help file - thank you; I will continue to test on real data over the next couple of weeks but, to repeat, it seems to work providing a .dta file with number of combinations/replications plus 1 for the actual data where the actual data result is the first observation in the new data file
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2411
#10

20 Sep 2018, 16:45

Rich Goldstein, Well, I find it seems to be fussy about filenames containing directories, although this is likely OS dependent (I'm using Windows 7). For me:

Code:

tsrtest foreign r(t) using junk.dta: ttest price, by(foreign)

works fine as intended, but

Code:

tsrtest foreign r(t) using c:/temp/junk.dta: ttest price, by(foreign)

// gives the error message
[CODE]could not successfully execute '/temp/junk.dta: ttest price, by(foreign)', aborting

Do you get a similar problem?
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4458
#11

21 Sep 2018, 09:28

Mike Lacy, I can't check this weekend but I will check next week

I admit that this desire had never occurred to me as I have separate directories for each of my projects and I don't cross directories in the way you tried;

I use a Mac with up-to-date OS
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4458
#12

27 Sep 2018, 10:45

@Mike Lacy: first, apologies for being so long; second, I did NOT get a similar problem; I sysuse the auto file while not in the Downloads directory and then

Code:

tsrtest foreign r(t) using /Volumes/Mac1/Users/rich/Downloads/junk.dta: ttest price, by(foreign)

worked fine and save junk.dta in the Downloads directory

I am using a MacPro with a fully version 10.13.6 of the OS
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2411
#13

27 Sep 2018, 15:58

Aha! Your example made the light bulb go on. It's the ":" contained in the full Windows filespec that is causing a problem. The program has some code to parse on ":" to find the user's statistical command, but gets fooled by the colon in the filespec. Should not be a problem in the Mac or *nix world. Perhaps I can figure out how -permute- or -bootstrap- handles the same parsing issue and fix -tsrtest- Thanks for posting this.
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4458
#14

27 Sep 2018, 19:03

you're welcome - and thanks for writing the software!
Comment
Romalpa Akzo

Join Date: Oct 2017

Posts: 369
#15

28 Sep 2018, 07:44

Mike Lacy, please receive my appreciation and admire for your wonderful -tsrtest-, which seems to be the perfect solution for Rich's issue.

In following this discussion, I just contribute a code (with basic Stata commands) which merely focuses on creating the complete enumeration of all possible arrangements (i.e selecting m from k members, m<k). This code might be an alternative (without any problem of time-consuming) for the code mentioned by Rich in #1.

Code:

clear local k = 20 local m = 5 set obs 1 gen r0 = 0 gen ex=. local sofar "r0" forval i = 1/`m' { replace ex = `k' -`m' +`i' - r`=`i'-1' expand ex bys `sofar': gen r`i' = r`=`i'-1' +_n local sofar = "`sofar'" + " r`i'" } drop ex r0
Comment

Announcement

systematic/structured permutations

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment