How to generate groups where the frequency is shown in each group

Kim Vaarts

Join Date: May 2025

Posts: 21
#1

How to generate groups where the frequency is shown in each group

22 May 2025, 05:47

Subjects get colors of balls in Room1, Room2, Room3 etc. I want to get the following form of patterns with the code below but I cannot get the frequencies so I can see exactly how many subjects received which colors balls:
groups Room1 Room2 Room3, missing

I want the output with the frequencies so I can identify patterns between different sets of colors of balls for example groups Room1 Room2 Room3, missing:
blue (frequency) yellow(frequency) orange (frequency)
Attached Files

stata forum.xlsx (0, 0 views)

Last edited by Kim Vaarts; 22 May 2025, 05:52.
Tags: None

Ken Chui

Join Date: Aug 2014
Posts: 1058

22 May 2025, 10:42

Could you perhaps provide the end product that you were envisioning, may using the data from ID 1 through 4?

This may be what you wanted, but I am unsure:

Code:

clear
input id str10 (room1 room2 room3 room4 room5) value1 value2 value3 value4 value5
1 blue yellow orange purple yellow 100 33 150 70 10
2 red brown black blue blue 500 55 25 10 5
3 yellow green blue green purple 150 200 20 10 5
4 purple blue red black green 150 50 20 5 5
end

egen color_sequence = concat(room*), punct("-")
tab color_sequence

Results:

Code:

                   color_sequence |      Freq.     Percent        Cum.
----------------------------------+-----------------------------------
 blue-yellow-orange-purple-yellow |          1       25.00       25.00
      purple-blue-red-black-green |          1       25.00       50.00
        red-brown-black-blue-blue |          1       25.00       75.00
   yellow-green-blue-green-purple |          1       25.00      100.00
----------------------------------+-----------------------------------
                            Total |          4      100.00

Last edited by Ken Chui; 22 May 2025, 10:47.

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35683

22 May 2025, 11:29

I think I can guess wildly at what you want.

There's no data example here worthy of the name. We've explained this repeatedly in other threads.

What you're showing as desired output is difficult for me to follow. The little tables of counts of colours in each room often are not in order and often repeat colours. I guess the untidy example doesn't hide more subtle rules.

You're using an image and (it seems) giving a link to an Excel attachment, both contrary to advice that we give. Many people are reluctant, or even totally unwilling, to open Excel files. But no-one, it seems, can access the attachment any way.

groups is community-contributed from the Stata Journal. It doesn't have options like

Code:

blue (frequency) yellow(frequency) orange (frequency)

if that is what you're hoping or asking.

All that said, here are three takes on your problem so far as I can follow it.

Code:

* you would be better just typing a fake example into Stata and then using -dataex- 
clear 
set obs 100
set seed 314159 

gen rnd =  runiform()

gen room1 = cond(rnd < 0.5, "red", cond(rnd < 0.8, "blue", "black"))
gen room2 = cond(rnd < 0.2, "red", cond(rnd < 0.5, "blue", "black"))
gen room3 = cond(rnd < 1/3, "red", cond(rnd < 2/3, "blue", cond(rnd < 5/6, "black", "purple"))) 

* start here 

* TAKE 1  easy: just run several one-way tables 

tab1 room*, sort 

* TAKE 2 quite easy: one composite table 

* download: ssc install tab_chi 
tabm room* 

* TAKE 3: closer to what you ask 

unab vars : room* 
local nv : word count `vars'

forval j = 1/`nv' { 
    preserve 
    rename room`j' room
    contract room
    gen which = `j'
    save freq`j', replace 
    restore 
}

clear 

use freq1 

forval j = 2/`nv' {
    local files `files' freq`j'
}

append using `files'

egen rank = rank(-_freq), by(which) unique 

reshape wide  room _freq, i(rank) j(which)

list

Code:

. tab1 room*, sort 

-> tabulation of room1  

      room1 |      Freq.     Percent        Cum.
------------+-----------------------------------
        red |         56       56.00       56.00
       blue |         27       27.00       83.00
      black |         17       17.00      100.00
------------+-----------------------------------
      Total |        100      100.00

-> tabulation of room2  

      room2 |      Freq.     Percent        Cum.
------------+-----------------------------------
      black |         44       44.00       44.00
       blue |         36       36.00       80.00
        red |         20       20.00      100.00
------------+-----------------------------------
      Total |        100      100.00

-> tabulation of room3  

      room3 |      Freq.     Percent        Cum.
------------+-----------------------------------
        red |         38       38.00       38.00
       blue |         32       32.00       70.00
      black |         15       15.00       85.00
     purple |         15       15.00      100.00
------------+-----------------------------------
      Total |        100      100.00

.
.

Code:

  
. tabm room* 

           |                   values
  variable |     black       blue     purple        red |     Total
-----------+--------------------------------------------+----------
     room1 |        17         27          0         56 |       100 
     room2 |        44         36          0         20 |       100 
     room3 |        15         32         15         38 |       100 
-----------+--------------------------------------------+----------
     Total |        76         95         15        114 |       300

.
.

Code:

  
. list 

     +----------------------------------------------------------+
     | rank   room1   _freq1   room2   _freq2    room3   _freq3 |
     |----------------------------------------------------------|
  1. |    1     red       56   black       44      red       38 |
  2. |    2    blue       27    blue       36     blue       32 |
  3. |    3   black       17     red       20    black       15 |
  4. |    4                .                .   purple       15 |
     +----------------------------------------------------------+

Comment

Kim Vaarts

Join Date: May 2025

Posts: 21
#4

24 May 2025, 16:43

Nick Cox thank you for your response. I actually want to know many people switch from room1 red to room2 black to room3 purple. I need to identify sequences. I am currently reading STATA articles to sequence patterns. I am looking for a code that identifies the sequence patterns that occur the most frequent in the order as is listed and to to change te order. Peeple can switch back to a color they had before. And yes, it is the same topic. I just don't know how to explain it better. I am sorry. I tried putting in an excel sheet as example. But I did not ask the question correclty. Which STATA code can I use to identify sequence patterns in STATA?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35683
#5

24 May 2025, 16:59

It seems to follow that Ken Chui has already answered your question.

Indeed groups from the Stata Journal is relevant if sequences are represented by different variables in the same observation.

However, I am still struggling to see that question in #1 or to know what the numbers in the value* variables mean.

Last edited by Nick Cox; 24 May 2025, 17:07.
Comment
Kim Vaarts

Join Date: May 2025

Posts: 21
#6

24 May 2025, 21:10

Nick Cox If there were a total of 3 subjects in room1 with a red ball, where 1 subject did not switch to room2, and the other 2 subjects switched to room2; where one subject switched to a blue ball and the other subject to green ball, I would have the following sequence patterns:
red 1
red 1 blue 1
red 1 green 1

I also need to know how many subjects there were in total with a red ball, how many of this total switched to a green ball and how many switched to a blue ball. This is a simplified example.

The code of Ken does not take all the number of subjects in the sequence patterns in account when I apply it to my data.

Last edited by Kim Vaarts; 24 May 2025, 21:13.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35683
#7

25 May 2025, 01:48

The code of Ken does not take all the number of subjects in the sequence patterns in account when I apply it to my data.

Not so, if I understand that claim correctly.

Ken Chui s example happens to include 4 distinct sequences, each occurring once only. If any sequence were repeated that would show up in the tabulation.

Code:

clear input id str10 (room1 room2 room3 room4 room5) value1 value2 value3 value4 value5 1 blue yellow orange purple yellow 100 33 150 70 10 2 red brown black blue blue 500 55 25 10 5 3 yellow green blue green purple 150 200 20 10 5 4 purple blue red black green 150 50 20 5 5 5 purple blue red black green 150 50 20 5 5 end egen color_sequence = concat(room*), punct("-") tab color_sequence color_sequence | Freq. Percent Cum. ----------------------------------+----------------------------------- blue-yellow-orange-purple-yellow | 1 20.00 20.00 purple-blue-red-black-green | 2 40.00 60.00 red-brown-black-blue-blue | 1 20.00 80.00 yellow-green-blue-green-purple | 1 20.00 100.00 ----------------------------------+----------------------------------- Total | 5 100.00

Naturally we can't see your real data, but you have a choice of data examples from #2 and #3 to use or to modify, and complete freedom to invent your own data example.

The essentials are as many as possible of

A. A data example presented through Stata code that people can discuss.

B. A clear example statement of what you want, whether it is a table, a data reduction, or something else.

C. People must be able to see that A implies B, this data yielding this output.

Thanks to Ken Chui on your behalf for trying to help out.

I think you may need to find someone where you work with a good grasp of Stata and show them your data and talk about what you want.
2 likes
Comment

Announcement