a client came to me with a small data set (2 groups, A and B, with 5 people in A and 7 in B) and asked what could be done
I suggested a permutation test; however, given the small N, I did not want to use -permute- which only does random permutations; with this small an N, going systematically through all possible permutations should be do-able
I did not know how to do this and reached out to tech-support; they sent me some code (shown below) for a total N of 5 with group sizes of 2 and 3 (note the changed lines in the code below for N=12 and group size=5); while their code ran quickly with an N of 5, it has now been running for more than 15 hours with no result with my actual group and total N's; I did send my revised code to tech support when it had been running more than 3 hours and they did not seem worried (or even surprised); however, I think that 15 hours is ridiculous so - below is my code - can anyone see a problem or way to speed it up; note that completely different code that goes thru the permutations is fine; also, tech-support's logic was to do the full set of permutations first and then do the analysis; while that logic is fine, I recognize the possibility that a different logic (do analysis for each permutation, post the result and then go to next permutation) is also possible; I can't currently supply my actual data but here is some data using the auto data set:
and here is the code I used (note that the last part shown on the screen (the do file is called "hpermute.do" and I typed "do hpermute") is the "while block"):
in my real data, I am using a linear regression with "foreign" as the main predictor (and baseline measure of the outcome as the other predictor); I have not shown the analysis commands or the commands for posting the results as the program is not getting there (and they worked fine on the trial using N=5 with groups sizes of 2 and 3)
added: as a check, I just entered the original code and ran it; it ran is 0.25 seconds
I suggested a permutation test; however, given the small N, I did not want to use -permute- which only does random permutations; with this small an N, going systematically through all possible permutations should be do-able
I did not know how to do this and reached out to tech-support; they sent me some code (shown below) for a total N of 5 with group sizes of 2 and 3 (note the changed lines in the code below for N=12 and group size=5); while their code ran quickly with an N of 5, it has now been running for more than 15 hours with no result with my actual group and total N's; I did send my revised code to tech support when it had been running more than 3 hours and they did not seem worried (or even surprised); however, I think that 15 hours is ridiculous so - below is my code - can anyone see a problem or way to speed it up; note that completely different code that goes thru the permutations is fine; also, tech-support's logic was to do the full set of permutations first and then do the analysis; while that logic is fine, I recognize the possibility that a different logic (do analysis for each permutation, post the result and then go to next permutation) is also possible; I can't currently supply my actual data but here is some data using the auto data set:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input int mpg byte foreign float id 18 0 1 18 0 2 18 0 3 19 0 4 19 0 5 19 0 6 24 0 7 17 1 8 23 1 9 25 1 10 23 1 11 35 1 12 end label values foreign origin label def origin 0 "Domestic", modify label def origin 1 "Foreign", modify
Code:
clear // an example of selecting 3 out of 5 for group 1 local num 12 // change this to total N local sel_num= 5 // change this to group size for one of the groups set obs `num' // get the index for each permutation mata: // get the info of selecting # of # number =strtoreal(st_local("num")) number sel_number = strtoreal(st_local("sel_num")) ind = J(number, 1, .) // define a matrix to permute which contains 1, 2, ... for(i=1;i<number+1; i++) { ind[i,1] = i } // use -cvpermute()- to get all the permutation // only select first 3 of 5 which is for group 1 info = cvpermutesetup(ind) V1=cvpermute(info) V_all = V1[1..sel_number,1] while((V1=cvpermute(info)) != J(0,1,.)) { V_all = V_all, sort(V1[1..sel_number,1], 1) } // drop duplicated indexes V_all=uniqrows(V_all') V_all // store the Mata matrix to Stata Matrix "index" st_matrix("index", V_all') end matlist index ********************************** /* The matrix "index" stores the observation number for group 1. If you want to create data set with group assignment for each permutation, here is an example: */ ************************* clear local num 12 // change to actual number set obs `num' gen obs_num = _n save mydata, replace // match each permutation index to get each data forvalues i=1/`=colsof(index)' { clear matrix this = index[1..., `i'] svmat this rename this1 obs_num save mythis, replace use mydata, clear merge 1:1 obs_num using mythis gen group`i' = (_merge == 3) drop _merge save mydata, replace }
added: as a check, I just entered the original code and ran it; it ran is 0.25 seconds
Comment