Summarize and find the most frequently occurring values of a variable based on another variable

Emory Davis

Join Date: Feb 2023

Posts: 28
#1

Summarize and find the most frequently occurring values of a variable based on another variable

23 Mar 2023, 13:23

Given the data below, I am trying to figure out how I can get a table/summarized data that tabulates the number of times the variable value occurs as 1/"Observed" and the number of times value occurs as 0/"Not Observed" for each level of the variable metric, and then can sort the result in descending order. Essentially, I want to find out which metrics are "Not Observed" most frequently (top 3 or 5) and which metrics are "Observed" most frequently. There are 21 levels of metric in my full data set. I can use -table- or -tabstat- to get the frequencies, but I can't figure out how I can sort any of that output by frequency for "Observed" or "Not Observed" (I realize I would have to sort for each separately). What can I use to achieve the output I want?

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float observe_num str23 metric byte value 1 "classmanage1" 1 1 "classmanage2" 1 1 "classmanage3" 1 1 "classmanage4" 0 1 "general1" 1 1 "general2" 1 1 "general3" 1 1 "general4" 1 1 "general5" 1 1 "letterwordworkoptional1" 1 1 "letterwordworkoptional2" 1 1 "phonicwordswork" 1 1 "readingnewbook1" 0 1 "readingnewbook2" 0 1 "readingnewbook3" 0 end label values value observe label def observe 0 "Not Observed", modify label def observe 1 "Observed", modify
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35724
#2

23 Mar 2023, 13:50

myaxis from the Stata Journal can help.

https://www.statalist.org/forums/for...e-or-graph-use

I will expand on that later if requested. Currently feeding myself.
1 like
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35724

23 Mar 2023, 16:29

Here is an example any way.

Code:

. webuse nlswork, clear
(National Longitudinal Survey of Young Women, 14-24 years old in 1968)

. myaxis ind_code1=ind_code, sort(count union) descending subset(union == 1)

. tab ind_code1 union if ind_code1 <= 5

  Industry |
        of |      1 if union
employment |         0          1 |     Total
-----------+----------------------+----------
        11 |     4,634      1,708 |     6,342 
         4 |     2,618      1,133 |     3,751 
         5 |       475        594 |     1,069 
        12 |       912        417 |     1,329 
         6 |     2,598        320 |     2,918 
-----------+----------------------+----------
     Total |    11,237      4,172 |    15,409 

. myaxis ind_code0=ind_code, sort(count union) descending subset(union == 0)

. tab ind_code0 union if ind_code0 <= 5

  Industry |
        of |      1 if union
employment |         0          1 |     Total
-----------+----------------------+----------
        11 |     4,634      1,708 |     6,342 
         4 |     2,618      1,133 |     3,751 
         6 |     2,598        320 |     2,918 
         7 |     1,628        133 |     1,761 
        12 |       912        417 |     1,329 
-----------+----------------------+----------
     Total |    12,390      3,711 |    16,101

To recycle a mild joke associated with the probabilist William Feller, feel free to use your own value of 5.

See also

https://journals.sagepub.com/doi/pdf...6867X211045582

https://journals.sagepub.com/doi/pdf...6867X221106436

Comment

Emory Davis

Join Date: Feb 2023

Posts: 28
#4

23 Mar 2023, 17:34

Nick Cox This is perfect, thank you! I adapted your example to my data and it gave me exactly what I need. Thanks also for sharing the additional resources.
Comment

Announcement

Summarize and find the most frequently occurring values of a variable based on another variable

Comment

Comment

Comment