Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Permutations and combinations in Stata

    Hi. I need some help with tuple coding commands to do the following:
    1) Create all possible combinations among 7 variables, with a minimum of 3 variables in a tuple.
    2) I need to know/display the frequencies in each tuple
    3) And then I need to add up all the counts of all the tuples into a new variable called poly. This poly variable should have as categories all the possible above tuples with their frequencies.
    I've looked at all the communication threads on Statalist regarding this, but not able to get the codes to generate and view the frequencies correct.
    Please help.

    Maliha Ali

  • #2
    Try starting with -ssc describe tuples-.

    Please read the FAQ on how to post good questions and guidelines on what is expected of users in the list.
    You should:

    1. Read the FAQ carefully.

    2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

    3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

    4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.

    Comment


    • #3
      My dataset includes 7 categories of tobacco use: eg. cigarette, cigars, hookah, ecigs, pipes, kreteks, smkless
      I need to create all possible combinations among 7 variables, with a minimum of 3 variables in a tuple.
      I need to know the frequency of each tuple, and save the tuples in a variable called poly.
      How do I display the frequencies in each tuple?

      *Below is as far as I could get*
      tuples cigarette cigars ecig hookah smkless piperoll krebid, display min(3)

      But this gives me about 99 tuples, with no indication of the frequency in each.

      I checked the tuples command on help but it didn't really help me figure this out.

      Please advise.

      Thanks.

      Please help in moving forwards.

      Comment


      • #4
        tuples (from SSC, as you are asked to explain) just gives you subsets of a set of names or other tokens with different numbers of tokens in the subset.

        It seems that you want is not to manipulate the tokens at all, but to count frequencies of cross-combinations.

        It's not really clear, as you flip between calling things like hookah categories and calling them variables.

        At a guess you want a command more like groups (also SSC).

        Please change your identifier to "Maliha Ali" (FAQ Advice explains how).

        Comment


        • #5
          Thank you Nick. I'm going to try to be more clear.
          I have 7 variables that represent types of tobacco use: cigarettes, cigars, ecigarettes, hookah, smkless, pipes, kreteks. The frequency in each variable are absolute counts (non-mutually exclusive of use of other tobacco products)
          People who use one tobacco product may also use other tobacco products.
          So I want to try a combination of these variables and see how many people use these products.
          The minimum variables (products of use) in combination use should be 3 and the maximum 7.
          I was under the impression that tuples was the way to go, however perhaps I am mistaken.
          What I do want is to find out how many people use for instance: cigarettes, cigars and ecigs (this combination ONLY and none of the remaining products - mutually exclusive combination), how many people use cigarettes, cigars and smkless (this combination ONLY and none of the remaining products),
          cigarettes, cigars and kreteks...etc,
          Product use can be in a combination of threes, and fours, fives, sixes and finally sevens (all variables), that is a minimum of 3 products used, and a maximum of all 7 products used. In the end I should be able to see what are the various combinations of products that people in my dataset are using and in which frequency. This should allow me to know what combination of tobacco products were most in use and the least.
          Finally, all these possible combinations should be saved as categories in a new variable called poly.

          I hope this helps. Please advise.

          Comment


          • #6
            I have already given some advice.

            Please consider whether groups (SSC) does what you want. You don't seem to have tried it or even looked at it.

            In addition check out contract.

            Comment


            • #7
              Thank you Nick. Both -contract- and -groups- work in giving me all possible combinations and the frequencies. - contract- overwrites the dataset so -groups- is better.
              Is there some manner in which I can extract particular group combinations into a new variable 'poly'. For instance the poly variable will have 0 for "no tobacco category", 1 "only one tobacco product", 2 "any two tobacco product use", 3 "3 or more tobacco product use".
              Thanks for your input.

              Comment


              • #8
                If your variables are indicator variables, then their row sum will give you you want, and that's given by the egen function rowtotal().

                If they aren't I think you will need to show us what they are.

                (Thanks for fixing your identifier.)
                Last edited by Nick Cox; 14 Oct 2014, 16:38.

                Comment

                Working...
                X