Combining descriptive and inferential statistics into a single table using the table and collect commands

Suzanna Vidmar

Join Date: Nov 2017
Posts: 27

Combining descriptive and inferential statistics into a single table using the table and collect commands

07 May 2023, 22:29

I would like to create a table displaying:

the risks(%) for each category of a binary variable
the risk difference (%)
a 95% CI (%)
p-value

For this example I'm using a dataset that's freely available and keeping only 2 categories of a categorical variable so that I have a binary independent variable.

Here's my code:

Code:

*Reading in the low birthweight dataset available on the Stata Press website.
use https://www.stata-press.com/data/r18/nhanes2l,clear

*I want a binary independent variable for this example, so keeping only the first 2 groups.
keep if inlist(smsa,1,2)

collect clear

*"binreg,rd" stores results in units of proportion. Multiplying by 100 to convert to percentage.
foreach var of varlist highbp diabetes {
    quietly: collect r(p): binreg `var' i.smsa, rd
    matrix define `var'_prop = r(table)
    matrix `var'_perc = `var'_prop*100
}

table (colname) (result coleq), ///
    command (p1=r(P1)*100 p2=r(P2)*100: prtest highbp, by(smsa)) ///
    command (p1=r(P1)*100 p2=r(P2)*100: prtest diabetes, by(smsa)) ///
    command(rd=highbp_perc[1,2] ll=highbp_perc[5,2] ul=highbp_perc[6,2] p=highbp_prop[4,2]: ///
    binreg highbp i.smsa, rd) ///
    command(rd=diabetes_perc[1,2] ll=diabetes_perc[5,2] ul=diabetes_perc[6,2] p=diabetes_prop[4,2]: ///
    binreg diabetes i.smsa, rd) nformat(%4.1f p1 p2 rd ll ul) nformat(%5.3f p) ///
    sformat(%s%% p1 p2 rd ll ul)

collect composite define ci = ll ul, delimiter(", ") trim
collect style cell result[ci], sformat((%s))
collect layout (command) (result[p1 p2 rd ci p])

collect recode command 1 = row1 ///
                       2 = row2 ///
                       3 = row1 ///
                       4 = row2 

collect label levels command row1 "Hypertension" row2 "Diabetes", modify
collect label levels result p1 "Central City", modify
collect label levels result p2 "Not Central City", modify
collect label levels result rd "Risk Difference", modify
collect label levels result ci "95% CI", modify
collect label levels result p "p-value", modify

collect style cell result, halign(center)
collect style cell border_block, border(right, pattern(nil))
collect style title, font(, size(12) bold)
collect title "Estimated risk (%) of specific health factors by SMSA"

collect preview

And here's the table it produces:

Code:

Estimated risk (%) of specific health factors by SMSA
------------------------------------------------------------------------------------------
              Central City   Not Central City   Risk Difference      (95% CI)      p-value
------------------------------------------------------------------------------------------
Hypertension      40.9%            42.0%              1.1%         (-1.5%, 3.7%)    0.400 
Diabetes          5.6%             4.0%              -1.6%        (-2.7%, -0.5%)    0.005 
------------------------------------------------------------------------------------------

The table is close to the final version I'd like, with just a couple of tweaks - see following table:

remove the parentheses around 95% CI
insert a line in the column header for "Central City" with "Yes" and "No" below it.

Code:

Estimated risk (%) of specific health factors by SMSA
------------------------------------------------------------------------------------------
                       Central City   
                   Yes              No          Risk Difference       95% CI       p-value
------------------------------------------------------------------------------------------
Hypertension      40.9%            42.0%              1.1%         (-1.5%, 3.7%)    0.400 
Diabetes          5.6%             4.0%              -1.6%        (-2.7%, -0.5%)    0.005 
------------------------------------------------------------------------------------------

How can I tweak my code to get the desired table? Is there a more efficient way to achieve this? I've only just started trying to understand table and collect, so there are huge gaps in my knowledge (which is likely apparent to Stata wizzes). Any pointers to gain a better understanding would be appreciated.

Tags: None

Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014
Posts: 700

08 May 2023, 13:19

You can be very specific on which cells to apply the sformat().
I changed

Code:

collect style cell result[ci], sformat((%s))

Code:

collect style cell result[ci]#cell_type[item], sformat((%s))

to get the parens added only to the item cells and not the column header too.

To get the column headers I changed

Code:

collect label levels result p1 "Central City", modify
collect label levels result p2 "Not Central City", modify

Code:

collect label levels result p1 "Yes", modify
collect label levels result p2 "No", modify
collect addtag group["Central City"], fortags(result[p1 p2])
collect addtag group[" "], fortags(result[rd ci p])
collect style header group, title(hide)
collect layout (command) (group#result[p1 p2 rd ci p])

Here is the resulting table

Code:

. collect preview

Estimated risk (%) of specific health factors by SMSA
-------------------------------------------------------------------------
               Central City                                              
                Yes      No    Risk Difference       95% CI       p-value
-------------------------------------------------------------------------
Hypertension   40.9%   42.0%         1.1%         (-1.5%, 3.7%)    0.400 
Diabetes       5.6%     4.0%        -1.6%        (-2.7%, -0.5%)    0.005 
-------------------------------------------------------------------------

Comment

Suzanna Vidmar

Join Date: Nov 2017
Posts: 27

08 May 2023, 20:06

Thanks again Jeff. That's very helpful. I would not have thought to use collect addtag.

For the benefit of anyone who's reading this post, I've underlined "Central City" with this line of code:

Code:

collect style cell cell_type[column-header]#group["Central City"], border(bottom, pattern(single))

And the final table is:

Code:

Estimated risk (%) of specific health factors by SMSA
-------------------------------------------------------------------------
               Central City                                              
             ----------------                                            
                Yes      No    Risk Difference       95% CI       p-value
-------------------------------------------------------------------------
Hypertension   40.9%   42.0%         1.1%         (-1.5%, 3.7%)    0.400 
Diabetes       5.6%     4.0%        -1.6%        (-2.7%, -0.5%)    0.005 
-------------------------------------------------------------------------

Cheers,
Suzanna

Announcement

Combining descriptive and inferential statistics into a single table using the table and collect commands

Comment

Comment