grouping variables/dims using collect

Daniel Shin

Join Date: Mar 2020
Posts: 146

grouping variables/dims using collect

01 Jul 2021, 11:00

I'm playing with Stata 17's collect and I'm getting stumped with grouping together results from same variables. I took a basic example from the manual using an NHANES dataset to create a summary table, and tried adding frequency of non-missing values per variable in addition to an overall frequency.

Code:

use https://www.stata-press.com/data/r17/nhanes2l, clear
collect clear
table (var) (sex), statistic(frequency) statistic(fvfrequency diabetes) statistic(fvpercent diabetes) statistic(mean age bmi) statistic(sd age bmi) statistic(fvfrequency hlthstat) statistic(fvpercent hlthstat) statistic(mean bpsystol) statistic(sd bpsystol)  statistic(count diabetes age bmi hlthstat bpsystol) nformat(%6.2f mean sd) miss
collect style header result, level(hide)
collect style row stack, nobinder spacer
collect style cell border_block, border(right, pattern(nil))
collect recode result fvfrequency=mean fvpercent=sd
collect recode result count=frequency
collect layout (var) (sex[1 2]#result)
collect style cell result[sd]#var[age bmi bpsystol], sformat("(%s)")
collect style cell result[sd]#var[diabetes hlthstat], sformat("%s%%")
collect style cell result[mean]#var[diabetes hlthstat], nformat(%4.0f)
collect preview

The result is the following table (copied from Tables Builder):

	Sex
	Male			Female
	4,915			5,436

Diabetes status
Not diabetic		4698	95.58%		5152	94.81%
Diabetic		217	4.42%		282	5.19%

Age (years)	4,915	47.42	(17.17)	5,436	47.72	(17.26)
Body mass index (BMI)	4,915	25.51	(4.02)	5,436	25.56	(5.60)

Health status
Excellent		1252	25.50%		1155	21.29%
Very good		1213	24.71%		1378	25.40%
Good		1340	27.30%		1598	29.45%
Fair		722	14.71%		948	17.47%
Poor		382	7.78%		347	6.40%

Systolic blood pressure	4,915	132.89	(20.99)	5,436	129.07	(25.13)
Diabetes status	4,915			5,434
Health status	4,909			5,426

The count statistic for continuous variables are grouped together with the other statistics, but those for categorical (e.g. diabetes and health status) are treated as separate entities. Does anyone know of an easy solution to this? Also, the statistic(frequency) option creates a _hide that I can't seem to label. I'm still confused about how to handle various dims in collect.

Tags: None

Daniel Shin

Join Date: Mar 2020

Posts: 146
#2

02 Sep 2021, 13:08

Haven't found a solution to this, and wondering if anyone from StataCorp can chime in.
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2402
#3

02 Sep 2021, 16:07

I'm not at my computer right now, but I think the following lines are the source of your problem. I'm not clear on exactly what layout you want but I assume you want all counts to be in the same column. Unfortunately, when you recoded the tags, you forced them into two distinct levels.

Code:

collect recode result fvfrequency=mean fvpercent=sd collect recode result count=frequency

Here the counts from continuous variables are tagged with -frequency- while those from factor variables are tagged -mean-. Aligning them requires them to be tagged the same for layout purposes.

Also _hide is a directive about whether to show level labels or not, and not something to label per se.
Comment
Daniel Shin

Join Date: Mar 2020

Posts: 146
#4

03 Sep 2021, 13:15

Leonardo, thank you for the reply. I believe the first line of recode takes the frequency counts of elements and places them under the mean to group them together in under the same column. So the third and fourth columns in the table above show means and sd for continuous, and freq and % for categorical variables. So at least presentation-wise, it is doing what I want. The second recode takes the frequency of non-missing values of a variable and groups them under the count column, which it is doing.

What's weird is that for the continuous variables (age, BMI), count (recode of frequency), mean, and sd are all grouped together on the same row. For the categorical variables, (diabetes, health), count are not grouped with mean and sd.
Comment

Announcement

grouping variables/dims using collect

Comment

Comment

Comment