Contract multiple categorical variables one-way with same levels?

Manuel PenaH

Join Date: Aug 2021
Posts: 12

Contract multiple categorical variables one-way with same levels?

12 Aug 2023, 08:15

I must be missing something really basic here, but how can I contract the frequencies of multiple categorical variables that have the same levels in a single line of code?

For example, I want to transform this dataset:

Code:

clear

set obs 20

gen id = _n

set seed 1234
gen year1 = round(runiform(1, 3))

set seed 19283
gen year2 = round(runiform(1, 3))

set seed 91827
gen year3 = round(runiform(1, 3))

label define categories 1 "Low" 2 "Medium" 3 "High"
label values year* categories

Into a dataset with the simple counts, one row per level:

Code:

Category      year1     year2     year3
Low           8         1         5
Medium        2         10        9
High          10        9         6

Something really inefficient would be the following, but what very basic command am I missing to do this quickly?

Code:

foreach var of varlist year* {
    preserve
    contract `var', freq(`var'_count)
    rename `var' category
    save `var', replace
    restore
}

use year1, clear
merge 1:1 category using year2, nogen
merge 1:1 category using year3, nogen

Tags: None

Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014
Posts: 707

12 Aug 2023, 10:59

Nothing downloadable from ssc springs to mind.

Here is another way that does not require looping or merge.

Code:

reshape long year, i(id) j(j)
rename year category
collapse (count) year=id, by(category j)
reshape wide year, i(category) j(j)
list

Here is the resulting dataset.

Code:

. list

     +----------------------------------+
     | category   year1   year2   year3 |
     |----------------------------------|
  1. |      Low       8       1       5 |
  2. |   Medium       2      10       9 |
  3. |     High      10       9       6 |
     +----------------------------------+

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30164
#3

12 Aug 2023, 11:04

I don't think you are missing any simple command to do this in one or two steps. The reason is that you are asking Stata to transform the data in a way that discards all of the information about the correspondence between values of year1, year2, and year3. You are just interested in their totals as if they were unrelated variables. The most inefficient part of your approach is that it thrashes the disk. You can do this all in memory by passing the data through a matrix instead, although unless your real data set is very large the difference in performance probably will not be noticeable:

Code:

clear set obs 20 gen id = _n set seed 1234 gen year1 = round(runiform(1, 3)) set seed 19283 gen year2 = round(runiform(1, 3)) set seed 91827 gen year3 = round(runiform(1, 3)) label define categories 1 "Low" 2 "Medium" 3 "High" label values year* categories forvalues i = 1/3 { tab year`i', matcell(count`i') matrow(values) } matrix M = values, count1, count2, count3 matrix colnames M = category year1 year2 year3 drop _all svmat M, names(col) label values category categories

Added: Crossed with #2.

Last edited by Clyde Schechter; 12 Aug 2023, 11:09.
1 like
Comment
Manuel PenaH

Join Date: Aug 2021

Posts: 12
#4

12 Aug 2023, 12:15

Many thanks to both of you, Jeff and Clyde, for these clever approaches. Much appreciated!
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35778

12 Aug 2023, 15:33

tabm from tab_chi on SSC can help here too.

Code:

. clear

. 
. set obs 20
Number of observations (_N) was 0, now 20.

. 
. gen id = _n

. 
. set seed 1234

. gen year1 = round(runiform(1, 3))

. 
. set seed 19283

. gen year2 = round(runiform(1, 3))

. 
. set seed 91827

. gen year3 = round(runiform(1, 3))

. 
. label define categories 1 "Low" 2 "Medium" 3 "High"

. label values year* categories

. 
. tabm year*, transpose replace 

           |             variable
    values |     year1      year2      year3 |     Total
-----------+---------------------------------+----------
       Low |         8          1          5 |        14 
    Medium |         2         10          9 |        21 
      High |        10          9          6 |        25 
-----------+---------------------------------+----------
     Total |        20         20         20 |        60 

. 
. list 

     +------------------+
     | _stack   _values |
     |------------------|
  1. |  year1      High |
  2. |  year1       Low |
  3. |  year1      High |
  4. |  year1      High |
  5. |  year1       Low |
     |------------------|
  6. |  year1      High |
  7. |  year1      High |
  8. |  year1      High |
  9. |  year1       Low |
 10. |  year1      High |
     |------------------|
 11. |  year1      High |
 12. |  year1       Low |
 13. |  year1    Medium |
 14. |  year1      High |
 15. |  year1       Low |
     |------------------|
 16. |  year1      High |
 17. |  year1       Low |
 18. |  year1       Low |
 19. |  year1       Low |
 20. |  year1    Medium |
     |------------------|
 21. |  year2       Low |
 22. |  year2    Medium |
 23. |  year2    Medium |
 24. |  year2      High |
 25. |  year2      High |
     |------------------|
 26. |  year2    Medium |
 27. |  year2    Medium |
 28. |  year2      High |
 29. |  year2      High |
 30. |  year2      High |
     |------------------|
 31. |  year2      High |
 32. |  year2    Medium |
 33. |  year2    Medium |
 34. |  year2      High |
 35. |  year2    Medium |
     |------------------|
 36. |  year2    Medium |
 37. |  year2      High |
 38. |  year2    Medium |
 39. |  year2    Medium |
 40. |  year2      High |
     |------------------|
 41. |  year3    Medium |
 42. |  year3      High |
 43. |  year3       Low |
 44. |  year3      High |
 45. |  year3      High |
     |------------------|
 46. |  year3      High |
 47. |  year3    Medium |
 48. |  year3      High |
 49. |  year3      High |
 50. |  year3    Medium |
     |------------------|
 51. |  year3    Medium |
 52. |  year3    Medium |
 53. |  year3    Medium |
 54. |  year3       Low |
 55. |  year3       Low |
     |------------------|
 56. |  year3       Low |
 57. |  year3    Medium |
 58. |  year3    Medium |
 59. |  year3       Low |
 60. |  year3    Medium |
     +------------------+

tabm goes back to 1998. although the replace option was perhaps not in the original.

Comment

Manuel PenaH

Join Date: Aug 2021

Posts: 12
#6

01 Sep 2023, 15:25

Thank you so much, Nick! I didn't know your tabm, thanks for sharing! I guess what this does to the data is similar to a reshape, like in Jeff's first line,

Code:

reshape long year, i(id) j(_stack)

But what I would like is the actual data to look like the output that appears in the results window when you run this:

Code:

tabm year*, transpose replace

So I guess I would still need a few extra steps to get to that. In any case, very helpful to learn about tabm, thank you!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35778
#7

02 Sep 2023, 00:52

You can get a row total variable by adding, e.g.

Code:

gen Total = year1 + year2 + year3

Having an extra observation with column totals is un-Stataish, but most people who want that know how to do it in a spreadsheet. Do you need it for any Stata purpose?
Comment

Announcement

Contract multiple categorical variables one-way with same levels?

Comment

Comment

Comment

Comment

Comment

Comment