Tabulation results

Richard Hassan

Join Date: Jan 2025
Posts: 4

Tabulation results

12 Feb 2025, 11:10

Hi folks,

I am looping over table command to generate tabulation tables in excel.

The key line in my code is:

table `row' `col', stat(percent, across(`row')) notot nformat(%4.1f) missing

Below is my full code:

* Tabulation 1: Demographic Characteristics
// Define all variables
local all_vars "province gender edu age"

// Define output file
local summary_file "Demographics Characteristics.xlsx"

// Loop through column variables
foreach col of local all_vars {

// Define row variables excluding the column variable
local row_vars: list all_vars - col

// Start creating the summary table
local first = 1
foreach row of local row_vars {
if `first' == 1 {
table `row' `col', stat(percent, across(`row')) notot nformat(%4.1f) missing
local first = 0
}
else {
table `row' `col', stat(percent, across(`row')) notot nformat(%4.1f) missing append
}
}

// Set font to "Calibri Light" and decimal format
collect style cell, font("Calibri Light") nformat(%4.1f)

// Export the summary table to "Demographics Characteristics.xlsx"
collect export "`summary_file'", sheet("`sheet_name'") modify
}

Below is my output for the province sheet.

	Province
	1	2	3	4
Gender
Male	100.0	42.9	100.0
Female		57.1		100.0
Educational Level
Primary School		28.6		100.0
High School	100.0	57.1	100.0
University Level		14.3
Age Group
0-24				100.0
25-34		28.6	100.0
35-44	100.0	28.6
45-59		28.6
60+		14.3

I would appreciate your supporting in formatting my output in the following way:

1. Amend my output to have the values of the row variables in column B, and to shift the rest of columns (tabulation output) one column to the right.
2. Add a Title in BOLD in the first row, and shift the whole table a row below.
3. Make the variable names in column A BOLD.

The output should look like:

Demographics Char. By Province

Province
		1	2	3	4
Gender
	Male	100.0	42.9	100.0
	Female		57.1		100.0
Educational Level
	Primary School		28.6		100.0
	High School	100.0	57.1	100.0
	University Level		14.3
Age Group
	0-24				100.0
	25-34		28.6	100.0
	35-44	100.0	28.6
	45-59		28.6
	60+		14.3

I am using Stata 18.

Thanks for your support.

Tags: None

Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014
Posts: 700

13 Feb 2025, 14:12

Your code, even if it were properly enclosed in a [code]...[/code] block, is difficult to comment on without data to go with it. In addition, it is missing some code you used to stack the row variables for each sheet and is missing the definition for macro sheet_name. I assume you used the column variable's name for the sheet name.

Using your example table, which also could have been enclosed in a code block, I was able to simulate a dataset. Here is the code I came up with, I highlight the changes/additions in blue.

Code:

clear all

* simulate some data
set seed 18
set obs 1000

generate province = runiformint(1,4)
label variable province "Province"

generate gender = runiformint(1,2)
label variable gender "Gender"
label define gender 1 "Male" 2 "Female"
label values gender gender

generate edu = runiformint(1,3)
label variable edu "Eductation Level"
label define edu ///
    1 "Primary School" ///
    2 "High School" ///
    3 "University Level"
label values edu edu

generate age = runiformint(1,5)
label variable age "Age Group"
label define age ///
    1 "0-24" ///
    2 "25-34" ///
    3 "35-44" ///
    4 "45-59" ///
    5 "60+"
label values age age

// Define all variables
// note: I like to use -unab- to ensure the variable names are not abbreviated
unab all_vars : province gender edu age

// Define output file
local summary_file "Demographics Characteristics.xlsx"

// Loop through column variables
local modify replace
foreach col of local all_vars {

    // Define row variables excluding the column variable
    local row_vars : list all_vars - col

    // get the minimum level of the current column variable
    summarize `col', meanonly
    local min = r(min)

    // Start creating the summary table
    local first = 1
    foreach row of local row_vars {
        if `first' == 1 {
            table `row' `col', ///
                stat(percent, across(`row')) ///
                nototals ///
                nformat(%4.1f) ///
                missing
            local first = 0
        }
        else {
            table `row' `col', ///
                append ///
                stat(percent, across(`row')) ///
                nototals ///
                nformat(%4.1f) ///
                missing
        }

        // add a transparent result to help us shift the first
        // real row level to be placed in the next row below the
        // dimension title
        collect get percent=" ", tags(`row'[_hide] `col'[`min'])

        // put this new row level first 
        quietly collect levels `row'
        collect style autolevels `row' _hide `s(levels)', clear

        // bold the row dimension's title
        collect style cell `row'[_hide], font(, bold)

    }

    // Set font to "Calibri Light" and decimal format
    collect style cell, font("Calibri Light") nformat(%4.1f)

    // Split the row headers so the dimension levels are placed in
    // their own column -- you need option -nospan- to prevent the row
    // titles from spanning into column of the row-header level
    collect style row split, dups(first) nospan

    // Add table title
    local vlab : variable label `col'
    if "`vlab'" == "" {
        local vlab "`col'"
    }
    collect title "Demographics Char. By `vlab'"
    collect style title, font(, bold)

    // Hide the column variable's title since we just put it in the
    // table title
    collect style header `col', title(hide)

    // Update layout to stack row variable results into a single
    // table
    collect layout (`row_vars') (`col') (result)

    // Export the summary table to "Demographics Characteristics.xlsx"
    local sheet_name "`col'"
    collect export `"`summary_file'"', sheet("`sheet_name'") `modify'
    local modify modify

}

Here is a screenshot of the province sheet, taken from Open Office on my Mac.

Click image for larger version

Name: Screenshot 2025-02-13 at 3.09.15 PM.png
Views: 1
Size: 33.4 KB
ID: 1772690

Comment

Hemanshu Kumar

Join Date: Mar 2015

Posts: 1411
#3

13 Feb 2025, 20:52

Jeff Pitblado (StataCorp) I didn't respond to this thread because I couldn't crack two aspects of OP's request:
leave the cell to the right of the variable name empty. You've solved this very cleverly by adding a hidden level to each factor variable. But perhaps there should be an easy way to do this without jumping through such creative hoops? Maybe a new suboption for collect style row split?

I don't know how to "target" (to use the manual's language) just the variable names in this table for formatting. For the above case, you've again solved this cleverly by making the entire row bold, which works because of the first trick, of introducing a hidden level. But suppose OP had not made that request, and there were to be visible data in the other columns of that row. How would you make just the variable name bold and not the rest of that row? collect style cell allows you to modify the format of cell_type[row_header] , but this affects both the row variable names and their levels in the table.

Last edited by Hemanshu Kumar; 13 Feb 2025, 21:02.
1 like
Comment
Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014

Posts: 700
#4

14 Feb 2025, 11:18

Hemanshu Kumar thank you for your suggestions. I can't make any promises, but we plan to continue adding features to collect to accommodate more (and easier to specify) customizations such as these.
1 like
Comment
Richard Hassan

Join Date: Jan 2025

Posts: 4
#5

16 Feb 2025, 13:02

Jeff Pitblado (StataCorp) Thanks a lot for your feedback! Your code addressed all of my three requests brilliantly.

Yes, I used the column names to name the sheets and I mistakenly trimmed the row stacking part of the code. My bad.

In the future, do you suggest I copy-paste part of the dataset or is it possible to attach a file with few obs.? Some datasets include a lot of columns and it might not be possible to copy-paste.

Thanks again!
Comment
Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014

Posts: 700
#6

17 Feb 2025, 16:05

You are welcome.

Regarding datasets, look at command dataex. You can use it to produce Stata code that subsets your dataset. It helps us when you provide a working example, it does not have to be your entire dataset. We need enough data for model fits and any special cases that are pertinent to your query.
Comment

Announcement

Tabulation results

Comment

Comment

Comment

Comment

Comment