Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Finding Most Common String Values Across Variables

    Hi all:

    I am trying to find the most common string values across variables. While I work in criminal justice data, I can't share that data. So I made a test set with color. Assume that each respondent might have multiple colors occur. Each time a color occurs, they get the color in a new variable. (In real life, these are charges.) There is no rhyme or reason why something is entered as the first color or second. I need to know the five most common colors that occur across the data set (the five most common charges). I know how to do this for one variable with the group command, but can't figure out how to do so across variables.

    I searched the forums, but did not find a solution.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str2 id str6 Color1 str5(Color2 Color3) str6 Color4
    "1"  "Blue"   ""      ""      ""      
    "2"  "Red"    "Black" "White" ""      
    "3"  "Orange" "Blue"  ""      ""      
    "4"  "Black"  "Red"   "Blue"  "Orange"
    "5"  "Blue"   ""      ""      ""      
    "6"  "Blue"   "Green" "Tan"   ""      
    "7"  "Green"  "Blue"  ""      ""      
    "8"  "Red"    "Blue"  "Green" ""      
    "9"  "Purple" ""      ""      ""      
    "10" "Black"  "Red"   ""      ""      
    end
    This is my first real post here, so please let me know if I did not enter the needed information.

    Thank you!

  • #2
    try the following:
    Code:
    reshape long Color, i(id) j(count)
    ta Color, sort

    Comment


    • #3
      Note also with tabm from tab_chi (SSC)

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str2 id str6 Color1 str5(Color2 Color3) str6 Color4
      "1"  "Blue"   ""      ""      ""      
      "2"  "Red"    "Black" "White" ""      
      "3"  "Orange" "Blue"  ""      ""      
      "4"  "Black"  "Red"   "Blue"  "Orange"
      "5"  "Blue"   ""      ""      ""      
      "6"  "Blue"   "Green" "Tan"   ""      
      "7"  "Green"  "Blue"  ""      ""      
      "8"  "Red"    "Blue"  "Green" ""      
      "9"  "Purple" ""      ""      ""      
      "10" "Black"  "Red"   ""      ""      
      end
      
      tabm Color*, rowsort transpose 
      
                 |                  variable
          values |    Color1     Color2     Color3     Color4 |     Total
      -----------+--------------------------------------------+----------
            Blue |         3          3          1          0 |         7 
             Red |         2          2          0          0 |         4 
           Black |         2          1          0          0 |         3 
           Green |         1          1          1          0 |         3 
          Orange |         1          0          0          1 |         2 
          Purple |         1          0          0          0 |         1 
             Tan |         0          0          1          0 |         1 
           White |         0          0          1          0 |         1 
      -----------+--------------------------------------------+----------
           Total |        10          7          4          1 |        22
      Code:
      
      

      Comment


      • #4
        Thank you both! I was able to successfully use tabm to find what I needed. HOwever, new problem --- the different "colors" are often spelled differently (e.g. orange and ornge and orane). This will be fun!

        Comment

        Working...
        X