Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tabulation results

    Hi folks,

    I am looping over table command to generate tabulation tables in excel.

    The key line in my code is:

    table `row' `col', stat(percent, across(`row')) notot nformat(%4.1f) missing

    Below is my full code:

    * Tabulation 1: Demographic Characteristics
    // Define all variables
    local all_vars "province gender edu age"

    // Define output file
    local summary_file "Demographics Characteristics.xlsx"

    // Loop through column variables
    foreach col of local all_vars {

    // Define row variables excluding the column variable
    local row_vars: list all_vars - col

    // Start creating the summary table
    local first = 1
    foreach row of local row_vars {
    if `first' == 1 {
    table `row' `col', stat(percent, across(`row')) notot nformat(%4.1f) missing
    local first = 0
    }
    else {
    table `row' `col', stat(percent, across(`row')) notot nformat(%4.1f) missing append
    }
    }


    // Set font to "Calibri Light" and decimal format
    collect style cell, font("Calibri Light") nformat(%4.1f)

    // Export the summary table to "Demographics Characteristics.xlsx"
    collect export "`summary_file'", sheet("`sheet_name'") modify
    }

    Below is my output for the province sheet.
    Province
    1 2 3 4
    Gender
    Male 100.0 42.9 100.0
    Female 57.1 100.0
    Educational Level
    Primary School 28.6 100.0
    High School 100.0 57.1 100.0
    University Level 14.3
    Age Group
    0-24 100.0
    25-34 28.6 100.0
    35-44 100.0 28.6
    45-59 28.6
    60+ 14.3
    I would appreciate your supporting in formatting my output in the following way:

    1. Amend my output to have the values of the row variables in column B, and to shift the rest of columns (tabulation output) one column to the right.
    2. Add a Title in BOLD in the first row, and shift the whole table a row below.
    3. Make the variable names in column A BOLD.

    The output should look like:

    Demographics Char. By Province
    Province
    1 2 3 4
    Gender
    Male 100.0 42.9 100.0
    Female 57.1 100.0
    Educational Level
    Primary School 28.6 100.0
    High School 100.0 57.1 100.0
    University Level 14.3
    Age Group
    0-24 100.0
    25-34 28.6 100.0
    35-44 100.0 28.6
    45-59 28.6
    60+ 14.3
    I am using Stata 18.

    Thanks for your support.

  • #2
    Your code, even if it were properly enclosed in a [code]...[/code] block, is difficult to comment on without data to go with it. In addition, it is missing some code you used to stack the row variables for each sheet and is missing the definition for macro sheet_name. I assume you used the column variable's name for the sheet name.

    Using your example table, which also could have been enclosed in a code block, I was able to simulate a dataset. Here is the code I came up with, I highlight the changes/additions in blue.
    Code:
    clear all
    
    * simulate some data
    set seed 18
    set obs 1000
    
    generate province = runiformint(1,4)
    label variable province "Province"
    
    generate gender = runiformint(1,2)
    label variable gender "Gender"
    label define gender 1 "Male" 2 "Female"
    label values gender gender
    
    generate edu = runiformint(1,3)
    label variable edu "Eductation Level"
    label define edu ///
        1 "Primary School" ///
        2 "High School" ///
        3 "University Level"
    label values edu edu
    
    generate age = runiformint(1,5)
    label variable age "Age Group"
    label define age ///
        1 "0-24" ///
        2 "25-34" ///
        3 "35-44" ///
        4 "45-59" ///
        5 "60+"
    label values age age
    
    // Define all variables
    // note: I like to use -unab- to ensure the variable names are not abbreviated
    unab all_vars : province gender edu age
    
    // Define output file
    local summary_file "Demographics Characteristics.xlsx"
    
    // Loop through column variables
    local modify replace
    foreach col of local all_vars {
    
        // Define row variables excluding the column variable
        local row_vars : list all_vars - col
    
        // get the minimum level of the current column variable
        summarize `col', meanonly
        local min = r(min)
    
        // Start creating the summary table
        local first = 1
        foreach row of local row_vars {
            if `first' == 1 {
                table `row' `col', ///
                    stat(percent, across(`row')) ///
                    nototals ///
                    nformat(%4.1f) ///
                    missing
                local first = 0
            }
            else {
                table `row' `col', ///
                    append ///
                    stat(percent, across(`row')) ///
                    nototals ///
                    nformat(%4.1f) ///
                    missing
            }
    
            // add a transparent result to help us shift the first
            // real row level to be placed in the next row below the
            // dimension title
            collect get percent=" ", tags(`row'[_hide] `col'[`min'])
    
            // put this new row level first 
            quietly collect levels `row'
            collect style autolevels `row' _hide `s(levels)', clear
    
            // bold the row dimension's title
            collect style cell `row'[_hide], font(, bold)
    
        }
    
        // Set font to "Calibri Light" and decimal format
        collect style cell, font("Calibri Light") nformat(%4.1f)
    
        // Split the row headers so the dimension levels are placed in
        // their own column -- you need option -nospan- to prevent the row
        // titles from spanning into column of the row-header level
        collect style row split, dups(first) nospan
    
        // Add table title
        local vlab : variable label `col'
        if "`vlab'" == "" {
            local vlab "`col'"
        }
        collect title "Demographics Char. By `vlab'"
        collect style title, font(, bold)
    
        // Hide the column variable's title since we just put it in the
        // table title
        collect style header `col', title(hide)
    
        // Update layout to stack row variable results into a single
        // table
        collect layout (`row_vars') (`col') (result)
    
        // Export the summary table to "Demographics Characteristics.xlsx"
        local sheet_name "`col'"
        collect export `"`summary_file'"', sheet("`sheet_name'") `modify'
        local modify modify
    
    }
    Here is a screenshot of the province sheet, taken from Open Office on my Mac.


    Click image for larger version

Name:	Screenshot 2025-02-13 at 3.09.15 PM.png
Views:	1
Size:	33.4 KB
ID:	1772690

    Comment


    • #3
      Jeff Pitblado (StataCorp) I didn't respond to this thread because I couldn't crack two aspects of OP's request:
      • leave the cell to the right of the variable name empty. You've solved this very cleverly by adding a hidden level to each factor variable. But perhaps there should be an easy way to do this without jumping through such creative hoops? Maybe a new suboption for collect style row split?
      • I don't know how to "target" (to use the manual's language) just the variable names in this table for formatting. For the above case, you've again solved this cleverly by making the entire row bold, which works because of the first trick, of introducing a hidden level. But suppose OP had not made that request, and there were to be visible data in the other columns of that row. How would you make just the variable name bold and not the rest of that row? collect style cell allows you to modify the format of cell_type[row_header] , but this affects both the row variable names and their levels in the table.
      Last edited by Hemanshu Kumar; 13 Feb 2025, 21:02.

      Comment


      • #4
        Hemanshu Kumar thank you for your suggestions. I can't make any promises, but we plan to continue adding features to collect to accommodate more (and easier to specify) customizations such as these.

        Comment


        • #5
          Jeff Pitblado (StataCorp) Thanks a lot for your feedback! Your code addressed all of my three requests brilliantly.

          Yes, I used the column names to name the sheets and I mistakenly trimmed the row stacking part of the code. My bad.

          In the future, do you suggest I copy-paste part of the dataset or is it possible to attach a file with few obs.? Some datasets include a lot of columns and it might not be possible to copy-paste.

          Thanks again!

          Comment


          • #6
            You are welcome.

            Regarding datasets, look at command dataex. You can use it to produce Stata code that subsets your dataset. It helps us when you provide a working example, it does not have to be your entire dataset. We need enough data for model fits and any special cases that are pertinent to your query.

            Comment

            Working...
            X