Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • categorizing data

    Hi all, I have a dataset includes up to 30 string variables. Some of them are dummy variables and the others are categorized with a limited number of categories. I'm trying to categorize the data according to their common features. A potential approach is to use the "tabulate" command. However, Tabulating for 30 variables makes no sense and is difficult even using a prefix command like "by."

  • #2
    Is there any efficient way to check the possibility of dividing the data into some categories ?!

    Comment


    • #3
      There is not going to be a single command you can run to make sense of your data; certainly not a tabulation of 30 variables. You are going to be running multiple tabulations of groups of variables looking to make sense of the patterns that emerge.

      With that said, for purposes of exploring your data, the collapse command can be helpful.
      Code:
      . sysuse auto, clear
      (1978 Automobile Data)
      
      . generate N = 1
      
      . collapse (sum) N, by(foreign rep78)
      
      . list, clean
      
             rep78    foreign    N  
        1.       1   Domestic    2  
        2.       2   Domestic    8  
        3.       3   Domestic   27  
        4.       4   Domestic    9  
        5.       5   Domestic    2  
        6.       .   Domestic    4  
        7.       3    Foreign    3  
        8.       4    Foreign    9  
        9.       5    Foreign    9  
       10.       .    Foreign    1
      While this example uses numeric by() variables, string variables are also allowed in the by() option.

      Comment

      Working...
      X