Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Summarize and find the most frequently occurring values of a variable based on another variable

    Given the data below, I am trying to figure out how I can get a table/summarized data that tabulates the number of times the variable value occurs as 1/"Observed" and the number of times value occurs as 0/"Not Observed" for each level of the variable metric, and then can sort the result in descending order. Essentially, I want to find out which metrics are "Not Observed" most frequently (top 3 or 5) and which metrics are "Observed" most frequently. There are 21 levels of metric in my full data set. I can use -table- or -tabstat- to get the frequencies, but I can't figure out how I can sort any of that output by frequency for "Observed" or "Not Observed" (I realize I would have to sort for each separately). What can I use to achieve the output I want?

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float observe_num str23 metric byte value
    1 "classmanage1"            1
    1 "classmanage2"            1
    1 "classmanage3"            1
    1 "classmanage4"            0
    1 "general1"                1
    1 "general2"                1
    1 "general3"                1
    1 "general4"                1
    1 "general5"                1
    1 "letterwordworkoptional1" 1
    1 "letterwordworkoptional2" 1
    1 "phonicwordswork"         1
    1 "readingnewbook1"         0
    1 "readingnewbook2"         0
    1 "readingnewbook3"         0
    end
    label values value observe
    label def observe 0 "Not Observed", modify
    label def observe 1 "Observed", modify

  • #2
    myaxis from the Stata Journal can help.

    https://www.statalist.org/forums/for...e-or-graph-use

    I will expand on that later if requested. Currently feeding myself.

    Comment


    • #3
      Here is an example any way.

      Code:
      . webuse nlswork, clear
      (National Longitudinal Survey of Young Women, 14-24 years old in 1968)
      
      . myaxis ind_code1=ind_code, sort(count union) descending subset(union == 1)
      
      . tab ind_code1 union if ind_code1 <= 5
      
        Industry |
              of |      1 if union
      employment |         0          1 |     Total
      -----------+----------------------+----------
              11 |     4,634      1,708 |     6,342 
               4 |     2,618      1,133 |     3,751 
               5 |       475        594 |     1,069 
              12 |       912        417 |     1,329 
               6 |     2,598        320 |     2,918 
      -----------+----------------------+----------
           Total |    11,237      4,172 |    15,409 
      
      . myaxis ind_code0=ind_code, sort(count union) descending subset(union == 0)
      
      . tab ind_code0 union if ind_code0 <= 5
      
        Industry |
              of |      1 if union
      employment |         0          1 |     Total
      -----------+----------------------+----------
              11 |     4,634      1,708 |     6,342 
               4 |     2,618      1,133 |     3,751 
               6 |     2,598        320 |     2,918 
               7 |     1,628        133 |     1,761 
              12 |       912        417 |     1,329 
      -----------+----------------------+----------
           Total |    12,390      3,711 |    16,101
      To recycle a mild joke associated with the probabilist William Feller, feel free to use your own value of 5.

      See also

      https://journals.sagepub.com/doi/pdf...6867X211045582

      https://journals.sagepub.com/doi/pdf...6867X221106436

      Comment


      • #4
        Nick Cox This is perfect, thank you! I adapted your example to my data and it gave me exactly what I need. Thanks also for sharing the additional resources.

        Comment

        Working...
        X