Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Combining descriptive and inferential statistics into a single table using the table and collect commands

    I would like to create a table displaying:
    • the risks(%) for each category of a binary variable
    • the risk difference (%)
    • a 95% CI (%)
    • p-value
    For this example I'm using a dataset that's freely available and keeping only 2 categories of a categorical variable so that I have a binary independent variable.

    Here's my code:
    Code:
    *Reading in the low birthweight dataset available on the Stata Press website.
    use https://www.stata-press.com/data/r18/nhanes2l,clear
    
    *I want a binary independent variable for this example, so keeping only the first 2 groups.
    keep if inlist(smsa,1,2)
    
    collect clear
    
    *"binreg,rd" stores results in units of proportion. Multiplying by 100 to convert to percentage.
    foreach var of varlist highbp diabetes {
        quietly: collect r(p): binreg `var' i.smsa, rd
        matrix define `var'_prop = r(table)
        matrix `var'_perc = `var'_prop*100
    }
    
    table (colname) (result coleq), ///
        command (p1=r(P1)*100 p2=r(P2)*100: prtest highbp, by(smsa)) ///
        command (p1=r(P1)*100 p2=r(P2)*100: prtest diabetes, by(smsa)) ///
        command(rd=highbp_perc[1,2] ll=highbp_perc[5,2] ul=highbp_perc[6,2] p=highbp_prop[4,2]: ///
        binreg highbp i.smsa, rd) ///
        command(rd=diabetes_perc[1,2] ll=diabetes_perc[5,2] ul=diabetes_perc[6,2] p=diabetes_prop[4,2]: ///
        binreg diabetes i.smsa, rd) nformat(%4.1f p1 p2 rd ll ul) nformat(%5.3f p) ///
        sformat(%s%% p1 p2 rd ll ul)
    
    collect composite define ci = ll ul, delimiter(", ") trim
    collect style cell result[ci], sformat((%s))
    collect layout (command) (result[p1 p2 rd ci p])
    
    collect recode command 1 = row1 ///
                           2 = row2 ///
                           3 = row1 ///
                           4 = row2 
    
    collect label levels command row1 "Hypertension" row2 "Diabetes", modify
    collect label levels result p1 "Central City", modify
    collect label levels result p2 "Not Central City", modify
    collect label levels result rd "Risk Difference", modify
    collect label levels result ci "95% CI", modify
    collect label levels result p "p-value", modify
    
    collect style cell result, halign(center)
    collect style cell border_block, border(right, pattern(nil))
    collect style title, font(, size(12) bold)
    collect title "Estimated risk (%) of specific health factors by SMSA"
    
    collect preview
    And here's the table it produces:
    Code:
    Estimated risk (%) of specific health factors by SMSA
    ------------------------------------------------------------------------------------------
                  Central City   Not Central City   Risk Difference      (95% CI)      p-value
    ------------------------------------------------------------------------------------------
    Hypertension      40.9%            42.0%              1.1%         (-1.5%, 3.7%)    0.400 
    Diabetes          5.6%             4.0%              -1.6%        (-2.7%, -0.5%)    0.005 
    ------------------------------------------------------------------------------------------
    The table is close to the final version I'd like, with just a couple of tweaks - see following table:
    • remove the parentheses around 95% CI
    • insert a line in the column header for "Central City" with "Yes" and "No" below it.
    Code:
    Estimated risk (%) of specific health factors by SMSA
    ------------------------------------------------------------------------------------------
                           Central City   
                       Yes              No          Risk Difference       95% CI       p-value
    ------------------------------------------------------------------------------------------
    Hypertension      40.9%            42.0%              1.1%         (-1.5%, 3.7%)    0.400 
    Diabetes          5.6%             4.0%              -1.6%        (-2.7%, -0.5%)    0.005 
    ------------------------------------------------------------------------------------------
    How can I tweak my code to get the desired table? Is there a more efficient way to achieve this? I've only just started trying to understand table and collect, so there are huge gaps in my knowledge (which is likely apparent to Stata wizzes). Any pointers to gain a better understanding would be appreciated.

  • #2
    You can be very specific on which cells to apply the sformat().
    I changed
    Code:
    collect style cell result[ci], sformat((%s))
    to
    Code:
    collect style cell result[ci]#cell_type[item], sformat((%s))
    to get the parens added only to the item cells and not the column header too.

    To get the column headers I changed
    Code:
    collect label levels result p1 "Central City", modify
    collect label levels result p2 "Not Central City", modify
    to
    Code:
    collect label levels result p1 "Yes", modify
    collect label levels result p2 "No", modify
    collect addtag group["Central City"], fortags(result[p1 p2])
    collect addtag group[" "], fortags(result[rd ci p])
    collect style header group, title(hide)
    collect layout (command) (group#result[p1 p2 rd ci p])
    Here is the resulting table
    Code:
    . collect preview
    
    Estimated risk (%) of specific health factors by SMSA
    -------------------------------------------------------------------------
                   Central City                                              
                    Yes      No    Risk Difference       95% CI       p-value
    -------------------------------------------------------------------------
    Hypertension   40.9%   42.0%         1.1%         (-1.5%, 3.7%)    0.400 
    Diabetes       5.6%     4.0%        -1.6%        (-2.7%, -0.5%)    0.005 
    -------------------------------------------------------------------------

    Comment


    • #3
      Thanks again Jeff. That's very helpful. I would not have thought to use collect addtag.

      For the benefit of anyone who's reading this post, I've underlined "Central City" with this line of code:
      Code:
      collect style cell cell_type[column-header]#group["Central City"], border(bottom, pattern(single))
      And the final table is:
      Code:
      Estimated risk (%) of specific health factors by SMSA
      -------------------------------------------------------------------------
                     Central City                                              
                   ----------------                                            
                      Yes      No    Risk Difference       95% CI       p-value
      -------------------------------------------------------------------------
      Hypertension   40.9%   42.0%         1.1%         (-1.5%, 3.7%)    0.400 
      Diabetes       5.6%     4.0%        -1.6%        (-2.7%, -0.5%)    0.005 
      -------------------------------------------------------------------------
      Cheers,
      Suzanna

      Comment

      Working...
      X