categorizing data

Ali Niazi

Join Date: Feb 2018

Posts: 28
#1

categorizing data

30 May 2019, 23:26

Hi all, I have a dataset includes up to 30 string variables. Some of them are dummy variables and the others are categorized with a limited number of categories. I'm trying to categorize the data according to their common features. A potential approach is to use the "tabulate" command. However, Tabulating for 30 variables makes no sense and is difficult even using a prefix command like "by."
Tags: None
Ali Niazi

Join Date: Feb 2018

Posts: 28
#2

31 May 2019, 00:57

Is there any efficient way to check the possibility of dividing the data into some categories ?!
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#3

31 May 2019, 04:42

There is not going to be a single command you can run to make sense of your data; certainly not a tabulation of 30 variables. You are going to be running multiple tabulations of groups of variables looking to make sense of the patterns that emerge.

With that said, for purposes of exploring your data, the collapse command can be helpful.

Code:

. sysuse auto, clear (1978 Automobile Data) . generate N = 1 . collapse (sum) N, by(foreign rep78) . list, clean rep78 foreign N 1. 1 Domestic 2 2. 2 Domestic 8 3. 3 Domestic 27 4. 4 Domestic 9 5. 5 Domestic 2 6. . Domestic 4 7. 3 Foreign 3 8. 4 Foreign 9 9. 5 Foreign 9 10. . Foreign 1

While this example uses numeric by() variables, string variables are also allowed in the by() option.
Comment

Announcement