Remapping dtable's test with by using collect layout

Jason Burns

Join Date: Apr 2023
Posts: 2

Remapping dtable's test with by using collect layout

13 Aug 2024, 07:54

Hello,
I have been setting up a custom descriptive statistics table for personal data use. Reading through the forums, I have been able to get the formatting close to my liking. However, I am having trouble remapping (?) the p-value of dichotomous variables, so that the value can be seen when excluding the 0.var level.

The layout is set using (roweq#var#`depvar') to provide a vertical format. This is necessary, as a horizontal format becomes difficult to read and compare when the depvar has more than two levels.

using example code:

Code:

clear all
webuse nhanes2
svyset 

gen index = inrange(hlthstat,1,3)
    label var index "Index Subpopulation"

*----------------------
* macro for continuous variables 
local convars age weight

* factor variables, all
local factorvars  diabetes rural race region

* Dependent variable
local depvar sex

* variables specified in options -continuous()- and -factor()- do not
* need to be specified in the varlist unless you want a special variable
* order that is otherwise too difficult to get using the options
dtable, by(`depvar', tests nomissing) svy subpop(index) column(by(hide))   ///
    continuous(`convars', statistics(total mean sd semean p50 p25 p75))    ///
    factor(`factorvars',                ///
        statistics(fvfrequency fvpercent)        ///
        test(svywald)                ///
    )                        

*------------------------------------------------------------
* Add -roweq- tag to nest vars into groups; -roweq- is a special
* dimension that will grab variable labels for it's levels that match
* variable names in the current frame
foreach c of local convars {
    collect addtags roweq[`c'], fortags(var[`c'])
    * hide this factor variable's title since we plan to use this
    * -roweq- level to title this variable in the header
    collect style header var[`c'], title(hide) level(hide) // hides the unbolded title of the variable 
    local ccall `ccall' `c'
}

foreach var of varlist `factorvars' {
    levelsof `var'    // get levels of each variable

    * Dichotomous variables 
    if r(r) == 2 {
        collect addtags roweq[`var'], fortags(var[i.`var'])
        * hide this factor variable's title since we plan to use this
        * -roweq- level to title this variable in the header
        collect style header `var', title(hide) level(hide)
        local fcall `fcall' 1.`var'
    }
    *  All other factor variables
    if r(r) > 2 {
        collect addtags roweq[`var'], fortags(var[i.`var'])
        collect style header `var', title(hide) level(label)
        local fcall `fcall' i.`var'
    }
}

*-------------------------------
* Shows the bolded title
collect style header roweq, title(hide) level(label)
* bold all the column headers
collect style cell cell_type[column-header], font(arial, bold)
* bold the levels of -roweq-
collect style cell roweq#cell_type[row-header], font(arial, bold)
* unbold the variable names/labels
collect style cell var#cell_type[row-header], font(arial, nobold)

collect style row stack, truncate(head)

// Composite results
*------------------------------- 
* stack non-missing counts and factor level frequencies
collect composite define col1 = total fvfrequency, trim
collect label levels result col1 "Total"
* stack means and factor level percentages
collect composite define col2 = mean fvpercent, trim
collect label levels result col2 "Mean/%"
// * stack means and factor level percentages
collect composite define col3 = regress svywald, trim
collect label levels result col3 "Test"

// Format
*-------------------------------
* Formatting: show custom label for results in the header
collect label levels result total "Total (N)"  sd "SD" semean "SE Mean" p50 "p50" p25 "p25" p75 "p75", modify
collect style header result, level(label)  title(hide)

* Changing the format of result cells
collect style cell result[total sd p50 p25 p75], nformat(%12.2gc)
collect style cell result[mean fvpercent], nformat(%9.2fc)

*------------------------------- 
* Setting autolevels for ease of use
collect style autolevels result, clear // removes existing levels for results
collect style autolevels result col1 col2 sd seamean frequency  percent p50 p25 p75  col3 

*------------------------------- 
* Add notes to the bottom using collect
collect notes 1: "Continuous variables tested via Regress"
collect notes 2: "Factor variables tested via Adjusted Wald Test"


// Layout 
*------------------------------- 
* uses only the present status (ie var == 1) of dichotomous variables, but missing p-value
collect layout (roweq#var[`ccall'  `fcall']#`depvar') (result)

* shows all, including absent status (var == 0) of dichotomous variables
collect layout (roweq#var#`depvar') (result)


* publish our table to MS Excel 
collect export ztable1.xlsx, replace

Code "Layout " shows the p-value in the first instance, but it is missing for dichotomous variables in the second instance when the "absent" level is removed.

Ideally, the "Test" value would be aligned with the roweq level (bolded title), though setting it just below the last value of i.var would also work. I have tried the code below without success.

Code:

collect remap `depvar'[_dtable_test]=`depvar'[0.`var'] // causes the p-values to be lost on preview 
collect remap sex[_dtable_test]=roweq // error, as already has tag at roweq
collect remap sex[_dtable_test]=sex[.m] // only moves within the levels of sex, not by var/roweq

Tags: None

Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014
Posts: 697

14 Aug 2024, 09:31

Thank you for providing a working example with data.

Note that if you have any factor variables with 2 levels that are not 0 and 1, then the "Dichotomous variables" block is not going to work as intended. Here is one way you could change your code to make identifying dichotomous variables more specific.

Code:

collect levelsof `var'
if `"`s(levels)'"' == "0 1" {
    ...
}
else {
    ...
}

In the following I've highlighted in blue my code changes to levelsof for identifying dichotomous variables and my code additions that put the p-values in line with the roweq headers.

Code:

clear all
webuse nhanes2
svyset

gen index = inrange(hlthstat,1,3)
    label var index "Index Subpopulation"

*----------------------
* macro for continuous variables
local convars age weight

* factor variables, all
local factorvars  diabetes rural race region

* Dependent variable
local depvar sex

* variables specified in options -continuous()- and -factor()- do not
* need to be specified in the varlist unless you want a special variable
* order that is otherwise too difficult to get using the options
dtable, by(`depvar', tests nomissing) svy subpop(index) column(by(hide))   ///
    continuous(`convars', statistics(total mean sd semean p50 p25 p75))    ///
    factor(`factorvars',                ///
        statistics(fvfrequency fvpercent)        ///
        test(svywald)                ///
    )

*------------------------------------------------------------
* Add -roweq- tag to nest vars into groups; -roweq- is a special
* dimension that will grab variable labels for it's levels that match
* variable names in the current frame
foreach c of local convars {
    collect addtags roweq[`c'], fortags(var[`c'])
    * hide this factor variable's title since we plan to use this
    * -roweq- level to title this variable in the header
    collect style header var[`c'], title(hide) level(hide) // hides the unbolded title of the variable
    local ccall `ccall' `c'
}

foreach var of varlist `factorvars' {
    collect levelsof `var'    // get levels of each variable

    * Dichotomous variables
    if `"`s(levels)'"' == "0 1" {
        collect addtags roweq[`var'], fortags(var[i.`var'])
        * hide this factor variable's title since we plan to use this
        * -roweq- level to title this variable in the header
        collect style header `var', title(hide) level(hide)
        * inject a hidden level for this variable's test
        collect remap var[0.`var'] = var[_h_`var'], ///
                fortags(var[0.`var']#result[svywald])
        collect style header var[_h_`var'], level(hide)
        local fcall `fcall' _h_`var' 1.`var'
    }
    else {
        *  All other factor variables
        collect addtags roweq[`var'], fortags(var[i.`var'])
        collect style header `var', title(hide) level(label)
        * inject a hidden level for this variable's test
        collect levels `var'
        local levels = s(levels)
        gettoken first : levels
        collect remap var[`first'.`var'] = var[_h_`var'], ///
                fortags(var[`first'.`var']#result[svywald])
        collect style header var[_h_`var'], level(hide)
        local fcall `fcall' _h_`var' i.`var'
    }
}

*-------------------------------
* Shows the bolded title
collect style header roweq, title(hide) level(label)
* bold all the column headers
collect style cell cell_type[column-header], font(arial, bold)
* bold the levels of -roweq-
collect style cell roweq#cell_type[row-header], font(arial, bold)
* unbold the variable names/labels
collect style cell var#cell_type[row-header], font(arial, nobold)

collect style row stack, truncate(head)

// Composite results
*-------------------------------
* stack non-missing counts and factor level frequencies
collect composite define col1 = total fvfrequency, trim
collect label levels result col1 "Total"
* stack means and factor level percentages
collect composite define col2 = mean fvpercent, trim
collect label levels result col2 "Mean/%"
// * stack means and factor level percentages
collect composite define col3 = regress svywald, trim
collect label levels result col3 "Test"

// Format
*-------------------------------
* Formatting: show custom label for results in the header
collect label levels result total "Total (N)"  sd "SD" semean "SE Mean" p50 "p50" p25 "p25" p75 "p75", modify
collect style header result, level(label)  title(hide)

* Changing the format of result cells
collect style cell result[total sd p50 p25 p75], nformat(%12.2gc)
collect style cell result[mean fvpercent], nformat(%9.2fc)

*-------------------------------
* Setting autolevels for ease of use
collect style autolevels result, clear // removes existing levels for results
collect style autolevels result col1 col2 sd seamean frequency  percent p50 p25 p75  col3

*-------------------------------
* Add notes to the bottom using collect
collect notes 1: "Continuous variables tested via Regress"
collect notes 2: "Factor variables tested via Adjusted Wald Test"


// Layout
*-------------------------------
* uses only the present status (ie var == 1) of dichotomous variables, but missing p-value
collect layout (roweq#var[`ccall'  `fcall']#`depvar') (result)

* make `depvar'[_dtable_test] show up first, then hide its label
collect query autolevels `depvar'
collect style autolevels `depvar' _dtable_test `s(levels)', clear
collect style header `depvar'[_dtable_test], level(hide)

collect preview

Here is the resulting table.

Code:

-----------------------------------------------------------
                   Total    Mean/%   SD  p50 p25 p75  Test
-----------------------------------------------------------
Age (years)                                           0.105
  Male          1888636417    39.91 (14)  37  27  51
  Female        2016337020    40.52 (15)  37  27  52
  Total         3904973437    40.22 (14)  37  27  52
Weight (kg)                                          <0.001
  Male          3727866905    78.78 (12)  78  70  86
  Female        3225742206    64.83 (13)  62  55  71
  Total         6953609111    71.63 (15)  70  60  81
Diabetes status                                       0.042
    Male           774,607  (1.64%)
    Female       1,087,465  (2.19%)
    Total        1,862,072  (1.92%)
Rural                                                 0.001
    Male        15,194,101 (32.11%)
    Female      14,145,717 (28.43%)
    Total       29,339,818 (30.22%)
Race                                                  0.847
  White
    Male        42,175,598 (89.13%)
    Female      44,498,965 (89.43%)
    Total       86,674,563 (89.28%)
  Black
    Male         3,814,338  (8.06%)
    Female       4,054,595  (8.15%)
    Total        7,868,933  (8.11%)
  Other
    Male         1,330,756  (2.81%)
    Female       1,204,788  (2.42%)
    Total        2,535,544  (2.61%)
Region                                                0.552
  NE
    Male        10,617,461 (22.44%)
    Female      10,559,571 (21.22%)
    Total       21,177,032 (21.81%)
  MW
    Male        11,824,970 (24.99%)
    Female      12,794,116 (25.71%)
    Total       24,619,086 (25.36%)
  S
    Male        11,574,661 (24.46%)
    Female      12,634,377 (25.39%)
    Total       24,209,038 (24.94%)
  W
    Male        13,303,600 (28.11%)
    Female      13,770,284 (27.67%)
    Total       27,073,884 (27.89%)
-----------------------------------------------------------
Continuous variables tested via Regress
Factor variables tested via Adjusted Wald Test

Comment

Jason Burns

Join Date: Apr 2023

Posts: 2
#3

15 Aug 2024, 04:30

Thank you so much! That solution works perfectly.
Comment

Announcement

Remapping dtable's test with by using collect layout

Comment

Comment