Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to generate groups where the frequency is shown in each group

    Subjects get colors of balls in Room1, Room2, Room3 etc. I want to get the following form of patterns with the code below but I cannot get the frequencies so I can see exactly how many subjects received which colors balls:
    groups Room1 Room2 Room3, missing

    I want the output with the frequencies so I can identify patterns between different sets of colors of balls for example groups Room1 Room2 Room3, missing:
    blue (frequency) yellow(frequency) orange (frequency)
    Attached Files
    Last edited by Kim Vaarts; 22 May 2025, 05:52.

  • #2
    Could you perhaps provide the end product that you were envisioning, may using the data from ID 1 through 4?

    This may be what you wanted, but I am unsure:

    Code:
    clear
    input id str10 (room1 room2 room3 room4 room5) value1 value2 value3 value4 value5
    1 blue yellow orange purple yellow 100 33 150 70 10
    2 red brown black blue blue 500 55 25 10 5
    3 yellow green blue green purple 150 200 20 10 5
    4 purple blue red black green 150 50 20 5 5
    end
    
    egen color_sequence = concat(room*), punct("-")
    tab color_sequence
    Results:

    Code:
                       color_sequence |      Freq.     Percent        Cum.
    ----------------------------------+-----------------------------------
     blue-yellow-orange-purple-yellow |          1       25.00       25.00
          purple-blue-red-black-green |          1       25.00       50.00
            red-brown-black-blue-blue |          1       25.00       75.00
       yellow-green-blue-green-purple |          1       25.00      100.00
    ----------------------------------+-----------------------------------
                                Total |          4      100.00
    Last edited by Ken Chui; 22 May 2025, 10:47.

    Comment


    • #3
      I think I can guess wildly at what you want.

      There's no data example here worthy of the name. We've explained this repeatedly in other threads.

      What you're showing as desired output is difficult for me to follow. The little tables of counts of colours in each room often are not in order and often repeat colours. I guess the untidy example doesn't hide more subtle rules.

      You're using an image and (it seems) giving a link to an Excel attachment, both contrary to advice that we give. Many people are reluctant, or even totally unwilling, to open Excel files. But no-one, it seems, can access the attachment any way.

      groups is community-contributed from the Stata Journal. It doesn't have options like

      Code:
      blue (frequency) yellow(frequency) orange (frequency)
      if that is what you're hoping or asking.

      All that said, here are three takes on your problem so far as I can follow it.

      Code:
      * you would be better just typing a fake example into Stata and then using -dataex- 
      clear 
      set obs 100
      set seed 314159 
      
      gen rnd =  runiform()
      
      gen room1 = cond(rnd < 0.5, "red", cond(rnd < 0.8, "blue", "black"))
      gen room2 = cond(rnd < 0.2, "red", cond(rnd < 0.5, "blue", "black"))
      gen room3 = cond(rnd < 1/3, "red", cond(rnd < 2/3, "blue", cond(rnd < 5/6, "black", "purple"))) 
      
      * start here 
      
      * TAKE 1  easy: just run several one-way tables 
      
      tab1 room*, sort 
      
      * TAKE 2 quite easy: one composite table 
      
      * download: ssc install tab_chi 
      tabm room* 
      
      * TAKE 3: closer to what you ask 
      
      unab vars : room* 
      local nv : word count `vars'
      
      forval j = 1/`nv' { 
          preserve 
          rename room`j' room
          contract room
          gen which = `j'
          save freq`j', replace 
          restore 
      }
      
      clear 
      
      use freq1 
      
      forval j = 2/`nv' {
          local files `files' freq`j'
      }
      
      append using `files'
      
      egen rank = rank(-_freq), by(which) unique 
      
      reshape wide  room _freq, i(rank) j(which)
      
      list
      Code:
      . tab1 room*, sort 
      
      -> tabulation of room1  
      
            room1 |      Freq.     Percent        Cum.
      ------------+-----------------------------------
              red |         56       56.00       56.00
             blue |         27       27.00       83.00
            black |         17       17.00      100.00
      ------------+-----------------------------------
            Total |        100      100.00
      
      -> tabulation of room2  
      
            room2 |      Freq.     Percent        Cum.
      ------------+-----------------------------------
            black |         44       44.00       44.00
             blue |         36       36.00       80.00
              red |         20       20.00      100.00
      ------------+-----------------------------------
            Total |        100      100.00
      
      -> tabulation of room3  
      
            room3 |      Freq.     Percent        Cum.
      ------------+-----------------------------------
              red |         38       38.00       38.00
             blue |         32       32.00       70.00
            black |         15       15.00       85.00
           purple |         15       15.00      100.00
      ------------+-----------------------------------
            Total |        100      100.00
      .
      .
      Code:
        
      . tabm room* 
      
                 |                   values
        variable |     black       blue     purple        red |     Total
      -----------+--------------------------------------------+----------
           room1 |        17         27          0         56 |       100 
           room2 |        44         36          0         20 |       100 
           room3 |        15         32         15         38 |       100 
      -----------+--------------------------------------------+----------
           Total |        76         95         15        114 |       300
      .
      .
      Code:
        
      . list 
      
           +----------------------------------------------------------+
           | rank   room1   _freq1   room2   _freq2    room3   _freq3 |
           |----------------------------------------------------------|
        1. |    1     red       56   black       44      red       38 |
        2. |    2    blue       27    blue       36     blue       32 |
        3. |    3   black       17     red       20    black       15 |
        4. |    4                .                .   purple       15 |
           +----------------------------------------------------------+

      Comment


      • #4
        Nick Cox thank you for your response. I actually want to know many people switch from room1 red to room2 black to room3 purple. I need to identify sequences. I am currently reading STATA articles to sequence patterns. I am looking for a code that identifies the sequence patterns that occur the most frequent in the order as is listed and to to change te order. Peeple can switch back to a color they had before. And yes, it is the same topic. I just don't know how to explain it better. I am sorry. I tried putting in an excel sheet as example. But I did not ask the question correclty. Which STATA code can I use to identify sequence patterns in STATA?

        Comment


        • #5
          It seems to follow that Ken Chui has already answered your question.

          Indeed groups from the Stata Journal is relevant if sequences are represented by different variables in the same observation.

          However, I am still struggling to see that question in #1 or to know what the numbers in the value* variables mean.
          Last edited by Nick Cox; 24 May 2025, 17:07.

          Comment


          • #6
            Nick Cox If there were a total of 3 subjects in room1 with a red ball, where 1 subject did not switch to room2, and the other 2 subjects switched to room2; where one subject switched to a blue ball and the other subject to green ball, I would have the following sequence patterns:
            red 1
            red 1 blue 1
            red 1 green 1

            I also need to know how many subjects there were in total with a red ball, how many of this total switched to a green ball and how many switched to a blue ball. This is a simplified example.

            The code of Ken does not take all the number of subjects in the sequence patterns in account when I apply it to my data.
            Last edited by Kim Vaarts; 24 May 2025, 21:13.

            Comment


            • #7
              The code of Ken does not take all the number of subjects in the sequence patterns in account when I apply it to my data.
              Not so, if I understand that claim correctly.

              Ken Chui s example happens to include 4 distinct sequences, each occurring once only. If any sequence were repeated that would show up in the tabulation.


              Code:
              clear
              input id str10 (room1 room2 room3 room4 room5) value1 value2 value3 value4 value5
              1 blue yellow orange purple yellow 100 33 150 70 10
              2 red brown black blue blue 500 55 25 10 5
              3 yellow green blue green purple 150 200 20 10 5
              4 purple blue red black green 150 50 20 5 5
              5 purple blue red black green 150 50 20 5 5
              end
              
              egen color_sequence = concat(room*), punct("-")
              tab color_sequence
              
                                color_sequence |      Freq.     Percent        Cum.
              ----------------------------------+-----------------------------------
               blue-yellow-orange-purple-yellow |          1       20.00       20.00
                    purple-blue-red-black-green |          2       40.00       60.00
                      red-brown-black-blue-blue |          1       20.00       80.00
                 yellow-green-blue-green-purple |          1       20.00      100.00
              ----------------------------------+-----------------------------------
                                          Total |          5      100.00
              Naturally we can't see your real data, but you have a choice of data examples from #2 and #3 to use or to modify, and complete freedom to invent your own data example.

              The essentials are as many as possible of

              A. A data example presented through Stata code that people can discuss.

              B. A clear example statement of what you want, whether it is a table, a data reduction, or something else.

              C. People must be able to see that A implies B, this data yielding this output.

              Thanks to Ken Chui on your behalf for trying to help out.

              I think you may need to find someone where you work with a good grasp of Stata and show them your data and talk about what you want.

              Comment

              Working...
              X