Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate different possible combinations

    Hi,

    I would like to generate multiple groups of observations for each different combination of a group of numbers. For example:

    var1 group order
    1 1 1
    2 1 2
    3 1 3
    4 1 4
    5 1 5
    6 1 6
    7 1 7

    and would like to get the following:

    var1 group order
    1 1 1
    2 1 2
    3 1 3
    4 1 4
    5 1 5
    6 1 6
    7 1 7
    1 2 1
    2 2 2
    3 2 3
    4 2 4
    5 2 5
    7 2 6
    6 2 7
    1 3 1
    2 3 2
    3 3 3
    7 3 4
    4 3 5
    5 3 6
    6 3 7

    etc..

    for all potential combinations of the numbers 1 to 7, so a total of 7!=5040 different groups. Any help would be much appreciated.

    Thank you

  • #2
    The number of combinations of three numbers each 1 to 7 is 7^3 = 363, and it is not clear to me from your sample output what order you want your results in, so perhaps I misunderstand what you seek.

    Perhaps this sample code will start you in a useful direction.
    Code:
    clear
    set obs 7
    generate v1 = _n
    expand 7
    bysort v1: generate v2 = _n
    expand 7
    bysort v1 v2: generate v3 = _n

    Comment


    • #3
      Hi William,

      Thank you for your help.

      I would like to generate all the possible order combinations of 7 numbers from 1 to 7. For example:
      1, 2, 3, 4, 5, 6, 7 in that order would be one group
      1, 2, 3, 4, 5, 7, 6 in that order would be a second group
      1, 2, 3, 4, 6, 5, 7 in that order would be a third group
      1, 2, 3, 4, 6, 7, 5 in that order would be a fourth group
      ... etc
      all the way to
      7, 6, 5, 4, 3, 2, 1 for group 5'040

      I was wondering if there is a script that could generate these groups, with an observation per group, as well as a variable indicating the order of the number within a group.

      Thanks again

      Comment


      • #4
        Here's a possible solution using -cross-.
        You create 7 datasets with values 1/7 then cross each dataset.
        This produces too many values because numbers can repeat using -cross-.
        So you drop cases with repeating numbers.
        Then you create variables with the positions of each number.

        Code:
        *make datasets
        forv i=1/7 {
            clear
            set obs 7
            gen v`i'=_n
            tempfile temp`i'
            save `temp`i'', replace
            }
        *cross datasets
        clear
        use `temp1'
        forv i=2/7 {
            cross using `temp`i''
            }
        *drop observations with repeated numbers
        forv i=1/7 {
            egen how_many_`i'=anycount(v1-v7), values(`i')
            drop if how_many_`i'>1
            drop how_many_`i'
            }
        
        *create vars with position of each number
        forv i=1/7 {
            gen whereis`i'=.
            forv j=1/7 {
                replace whereis`i'=`j' if v`j'==`i'
                }
            }
            
        sort v1-v7
        Stata/MP 14.1 (64-bit x86-64)
        Revision 19 May 2016
        Win 8.1

        Comment


        • #5
          In U.S. usage, the relevant term for searching is "permutation."
          Mata has a function -cvpermute- to generate permutations, which can be used to create the stacked permutations of 1, ..., 7 in one column of a Stata dataset.
          Code:
          mata mata clear
          mata:
          X = 1\2\3\4\5\6\7
          perms = J(0,1,.)
          info = cvpermutesetup(X)
          while ((p = cvpermute(info)) != J(0,1,.)) {
             perms = perms\p
          }
          end
          //
          //
          clear
          getmata var1 = perms
          gen byte group = 1+ floor((_n-1)/7)
          gen byte order = 1+ mod(_n-1,7)
          Last edited by Mike Lacy; 17 Sep 2018, 09:07.

          Comment


          • #6
            Thanks for posting that solution, Mike. I saw that Mata function but my Mata programming is nearly nonexistent. I’m slowing working my way through Bill Gould’s The Mata Book, but the learning curve is steep; thus, I tend toward convoluted Stata solutions. Short examples like yours are very helpful in the learning process!
            Stata/MP 14.1 (64-bit x86-64)
            Revision 19 May 2016
            Win 8.1

            Comment


            • #7
              Thank you Nicolas for the clarification. I had misinterpreted your use of "combinations" as a technical term from mathematics - as Mike points out, the corresponding term for what you want is "permutations" - all possible permutations of the digits 1 through 7.

              Thank you Mike for your solution. Like Carole, my Mata is sketchy at best, and the existence of getmata escaped my memory and slowed down my programming, while complicating it unduly.

              For what it's worth, here's my code after incorporating getmata - note that for testing, it permutes 1-3 instead of 1-7, making it easy to confirm that it is working as intended.

              Code:
              mata
              mata clear
              list = (1\2\3)
              // list = (1\2\3\4\5\6\7)
              setup = cvpermutesetup(list,1)
              M = J(0,3,.)
              group = 0
              while ((p=cvpermute(setup)) != J(0,1,.)) {
                  M = M \ ( p , J(rows(list),1,++group) , list )
                  }
              end
              clear
              getmata (var1 group order) = M
              list, sepby(group)
              Last edited by William Lisowski; 17 Sep 2018, 09:32.

              Comment


              • #8
                Mike Lacy There is an error in the creation of the group variable. It needs to be larger than byte size since there will be 5040 groups.
                Stata/MP 14.1 (64-bit x86-64)
                Revision 19 May 2016
                Win 8.1

                Comment


                • #9
                  Thanks, Carole J. Wilson yes, group will overflow a byte. And, William Lisowski , I used -getmata- only because I find the documentation of -st_store()- and -st_addvar- quite obscure and therefore use -getmata- whenever I can get away with it <grin>.

                  Comment


                  • #10
                    Hi Carole,

                    This is perfect thank you!

                    Comment


                    • #11
                      Another solution with basic commands.

                      Code:
                      clear
                      set obs 7
                      gen a1=_n
                      local varlist "a1"
                      
                      forval i=2/7 {
                      expand 7
                      bys *: gen a`i' = _n if !inlist(_n,`varlist')
                      local varlist "`varlist', a`i'"
                      }
                      
                      egen group = group(*)
                      drop if missing(group)
                      
                      * You might also want:
                      reshape long a, i(group) j(order)
                      Last edited by Romalpa Akzo; 18 Sep 2018, 01:43.

                      Comment


                      • #12
                        Romalpa Akzo's solution in post #11 is ingenious and follows the path of generating seven variables I'd hoped to follow but could not see how to make it work.

                        For those seeing this topic at a later date and wanting to generate the k! permutations of the digits 1-k for some k other than 7, it is worth noting that the code in post #11, like that in post #4, generates an intermediate dataset of k^k observations. On my copy of Stata 15.1 , that limits the technique to at most k=10. This restriction can be overcome by adding
                        Code:
                        drop if a`i'==.
                        to the forval loop (and thus the later drop command is unnecessary).

                        I'm sure there is some similar limit on the size of the mata matrix generated by the solutions in posts #5 and #7, but my Mata knowledge doesn't extend that far yet. The workaround would be to modify the Mata code to use st_store() to return each permutation directly to Stata. Having had no luck figuring that out yesterday, I won't attempt it again today.

                        For any of these solutions, based on Mata or based on basic commands, runtime increases substantially as k increases. In my experience on my system, k=7 was about all I had patience for. So i guess for me the binding constraint is my patience rather than the limits on the number of observations in my copy of Stata.

                        Comment


                        • #13
                          The suggestion (dropping missing a`i') by William Lisowski does make sense, not only for reducing the code's length but also for better dealing with the memory limitation.

                          In the same direction, below modification is a little bit more effective than the solution in #11 and could serve up to k=10. For k=11, which requires the number of observations as 11! (39.916.800) just for the final output of permutations, it is clear that extra efforts would be needed.
                          Code:
                          clear
                          local k = 10
                          set obs `k'
                          gen a1=_n
                          local varlist "a1"
                          
                          forval i=2/`k' {
                          expand `k'+1-`i'
                          bys * : gen a`i' = _n
                          ds a`i', not
                          
                          forval j=2/`i'{
                          qui bys `r(varlist)': replace a`i'= a`i' + 1 if inlist(a`i',a`i'[_n-1],`varlist')
                          }
                          local varlist "`varlist', a`i'"
                          }

                          Comment


                          • #14
                            Good day to all y’all!

                            I am working on creating a variable that, conceptually, is simple, but time consuming. I would like to know if there is a quicker way to do it. I’m using a binary sequence to categorize panel respondents based on their participation across waves. There are five waves, which means there 120 possible categories/permutations (i.e. 5!). Each numeral in the variable name (e.g. WVS10000) represents whether or not a given respondent participated in a panel wave; the numerals are in chronological order from the first wave, to the second wave and so forth. Here is some syntax:

                            gen WVS10000 if W1Part==1 W2Part==0 W3part==0 W4part==0 W5Part==0
                            gen WVS01000 if W1Part==0 W2Part==1 W3part==0 W4part==0 W5Part==0
                            gen WVS00100 if W1Part==0 W2Part==0 W3part==1 W4part==0 W5Part==0
                            gen WVS00010 if W1Part==0 W2Part==0 W3part==0 W4part==1 W5Part==0
                            gen WVS00001 if W1Part==0 W2Part==0 W3part==0 W4part==0 W5Part==1
                            gen WVS11000 if W1Part==1 W2Part==1 W3part==0 W4part==0 W5Part==0
                            gen WVS10100 if W1Part==1 W2Part==0 W3part==1 W4part==0 W5Part==0
                            gen WVS10010 if W1Part==1 W2Part==0 W3part==0 W4part==1 W5Part==0
                            gen WVS10001 if W1Part==0 W2Part==1 W3part==0 W4part==0 W5Part==1


                            I’d be grateful for any help.
                            Thanks in advance

                            Luther

                            Comment


                            • #15
                              Please use dataex to show your data (see FAQ: https://www.statalist.org/forums/help#stata)

                              The solution uses 4 waves because I didn't feel like adding a 5th variable, but you would just add W5Part in the -egen- command.

                              Code:
                              * Example generated by -dataex-. To install: ssc install dataex
                              clear
                              input float(W1Part W2Part W3Part W4Part)
                              1 0 0 0
                              0 1 0 0
                              0 0 1 0
                              0 0 0 1
                              0 0 0 0
                              1 1 0 0
                              1 0 1 0
                              1 0 0 1
                              0 1 0 0
                              end
                              
                              egen allwaves=concat(W1Part W2Part W3Part W4Part)
                              replace allwaves= "WVS"+allwaves
                              Stata/MP 14.1 (64-bit x86-64)
                              Revision 19 May 2016
                              Win 8.1

                              Comment

                              Working...
                              X