Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Collect System: Displaying categorical covariate level percentages across rows/outcome levels, rather than across column/covariate levels

    Hello,

    I have using a similar code as below to create customizable tables using the Collect system. This system has been very handy!

    Code:
    clear
    webuse nhanes2l
    tab highbp
    
    collect clear
    collect layout, clear
        
    * Obtain an initial table with necessary statistics                
    collect: table ///
        (var) ///
        (highbp), ///
            statistic(mean age) ///
                statistic(sd age) ///
                statistic(fvfrequency race) ///
                statistic(fvpercent race)
    
    * The highbp level value labeled "Total" can later be tagged with a p-value interest
    collect label list highbp, all
    
    * Calculate a p-value for association between highbp and age
    ologit highbp c.age
    * Consume that p-value and attach tags to it for the layout command later on to know where it should be placed
    collect get manual_pvalue = e(p), tags(var[age] highbp[.m])
    
    * Calculate a p-value for the association between highbp and race
    tab race highbp, chi2 nolabel // note the value of the first level "White". Will use this value for correct tagging/position placement of the Pvalue in next command
    * Consume that p-value and attach tags to it for the layout command later on to know where it should be placed
    collect get manual_pvalue = r(p), tags(var[1.race] highbp[.m])
    
    * Add a custom formatted footnote for the p-values, corresponding to the p-values obtained in table
    collect stars manual_pvalue 0.01 "***" 0.05 "**" 0.1 "*", ///
        attach(manual_pvalue) shownote
    * Update the formatting for easier readability
    collect style cell result[manual_pvalue], nformat(%5.2f) minimum(.01)
    
    * Recode the levels of the result dimension to allow combined placement of the statistic of interest later in the shared cells in the final table
    collect recode result ///
        fvfrequency = variable_measure ///
        fvpercent = variable_spread ///
        mean = variable_measure ///
        sd = variable_spread //
    
    *Add labels to the mentioned combined levels for better table readability
    collect label levels result variable_measure "Count / Mean" variable_spread "% / (SD)"  
    collect label levels result manual_pvalue "P-Value"  
    
    *Create the final table   
    collect layout ///
        (var) ///
        (highbp#result[variable_measure variable_spread] result[manual_pvalue])
    Above code results in the following table:

    Code:
    -----------------------------------------------------------------------------------------------------
                |                              High blood pressure                                P-Value
                |             0                         1                       Total                    
                |  Count / Mean   % / (SD)   Count / Mean   % / (SD)   Count / Mean   % / (SD)           
    ------------+----------------------------------------------------------------------------------------
    Age (years) |      42.16502   16.77157       54.97281   14.90897       47.57965   17.21483   <0.01***
    Race=White  |          5317   88.98745           3748   85.64899           9065   87.57608   <0.01***
    Race=Black  |           545   9.121339            541   12.36289           1086   10.49174           
    Race=Other  |           113   1.891213             87   1.988117            200    1.93218           
    -----------------------------------------------------------------------------------------------------
    *** p<.01, ** p<.05, * p<.1

    My Question:

    Currently, the resultant table above, reports the percentages of the levels of the Race predictor (i.e. White, Black, Other) across the column. In other words, it shows, for example, "what percentage among all people who do not have High Blood Pressure are White, Black, or Other" . However, I want to report percentage levels of the Race predictor across the row (e.g. "what percentage among all persons who are White do not have High Blood Pressure? And what percentage among all persons who are White have High Blood Pressure?")

    What would be the best approach to report composition percentage of the covariates across levels of the outcome, as opposed to within the levels of the covariate itself (as the code above shows)?

    Thank you so much in advance!

  • #2
    Originally posted by Nathan Yu View Post
    Currently, the resultant table above, reports the percentages of the levels of the Race predictor (i.e. White, Black, Other) across the column. In other words, it shows, for example, "what percentage among all people who do not have High Blood Pressure are White, Black, or Other" . However, I want to report percentage levels of the Race predictor across the row (e.g. "what percentage among all persons who are White do not have High Blood Pressure? And what percentage among all persons who are White have High Blood Pressure?")
    You have a lot of extra details not relevant to your question. What this boils down to is the difference between the commands highlighted in red and blue below.

    Code:
    webuse nhanes2l, clear
    table (var) (highbp), statistic(fvfrequency race) statistic(fvpercent race) nototal
    qui table (var) (race), statistic(fvfrequency highbp) statistic(fvpercent highbp)
    collect layout (race[1 2 3]#result) (var)
    Res.:

    Code:
    . table (var) (highbp), statistic(fvfrequency race) statistic(fvpercent race) nototal
    
    ----------------------------------------------------
                                |   High blood pressure 
                                |          0           1
    ----------------------------+-----------------------
    Race=White                  |                       
      Factor-variable frequency |      5,317       3,748
      Factor-variable percent   |      88.99       85.65
    Race=Black                  |                       
      Factor-variable frequency |        545         541
      Factor-variable percent   |       9.12       12.36
    Race=Other                  |                       
      Factor-variable frequency |        113          87
      Factor-variable percent   |       1.89        1.99
    ----------------------------------------------------
    
    . 
    . qui table (var) (race), statistic(fvfrequency highbp) statistic(fvpercent highbp)
    
    . 
    . collect layout (race[1 2 3]#result) (var)
    
    Collection: Table
          Rows: race[1 2 3]#result
       Columns: var
       Table 1: 10 x 2
    
    ------------------------------------------------------
                                  |   High blood pressure 
                                  |          0           1
    ------------------------------+-----------------------
    Race                          |                       
      White                       |                       
        Factor-variable frequency |      5,317       3,748
        Factor-variable percent   |      58.65       41.35
      Black                       |                       
        Factor-variable frequency |        545         541
        Factor-variable percent   |      50.18       49.82
      Other                       |                       
        Factor-variable frequency |        113          87
        Factor-variable percent   |      56.50       43.50
    ------------------------------------------------------
    
    .

    Comment


    • #3
      Hi Andrew, apologies for the delay and thank you so much for the support. How would this work with multiple categorical and continuous variables? For example, if I also wanted the table to include the variables "race", "bmi", "rural", and "age" altogether as rows in the column?

      As well as including the P-Value calculation as the code in the first post?

      Comment


      • #4
        Age and bmi should be considered continuous variables. How they should be tabulated against levels of high blood pressure is not clear. For categorical variables, you can combine tables as follows:

        Code:
        webuse nhanes2l, clear
        collect clear
        table (var) (race), statistic(fvfrequency highbp) statistic(fvpercent highbp) name(race) nototal
        table (var) (rural), statistic(fvfrequency highbp) statistic(fvpercent highbp) name(rural) nototal
        collect combine all= race rural
        collect label levels result fvfrequency "Freq.", modify
        collect label levels result fvpercent "Pct.", modify
        collect layout (race rural) (var#result)
        Res.:

        Code:
        . collect layout (race rural) (var#result)
        
        Collection: all
              Rows: race rural
           Columns: var#result
           Table 1: 7 x 4
        
        ----------------------------------------
                |       High blood pressure    
                |        0               1      
                |  Freq.    Pct.   Freq.    Pct.
        --------+-------------------------------
        Race    |                              
          White |  5,317   58.65   3,748   41.35
          Black |    545   50.18     541   49.82
          Other |    113   56.50      87   43.50
        Rural   |                              
          Urban |  3,798   58.00   2,750   42.00
          Rural |  2,177   57.24   1,626   42.76
        ----------------------------------------

        Comment


        • #5
          Hi Andrew,

          I would like to have mean and SD for the continuous variables, as in the example in opening post. Afterwards, I will also add P-Values, to tell if the variables are different across levels of highbp.

          I've spent some time on this, but cannot achieve the similar table as the opening post. I am not sure if the following is the correct approach?

          Code:
          webuse nhanes2l, clear
          collect clear
          table (var) (race), statistic(fvfrequency highbp) statistic(fvpercent highbp) name(race) nototal
          table (var) (rural), statistic(fvfrequency highbp) statistic(fvpercent highbp) name(rural) nototal
          table (highbp) (var), statistic(mean bmi) statistic(sd bmi) name(bmi) nototal // There must be a different approach?
          table (highbp) (var), statistic(mean age) statistic(sd age)  name(age) nototal // There must be a different approach?
          collect combine all= race rural bmi age
          collect label levels result fvfrequency "Freq.", modify
          collect label levels result fvpercent "Pct.", modify
          collect label levels result mean "Mean", modify
          collect label levels result sd "SD", modify
          collect layout (race rural highbp) (var#result)
          Which results in:

          Code:
          Collection: all
                Rows: race rural highbp
             Columns: var#result
             Table 1: 10 x 8
          
          ----------------------------------------------------------------------------------------------------
                              |      Age (years)        Body mass index (BMI)         High blood pressure    
                              |      Mean         SD         Mean           SD         0               1      
                              |                                                  Freq.    Pct.   Freq.    Pct.
          --------------------+-------------------------------------------------------------------------------
          Race                |                                                                              
            White             |                                                  5,317   58.65   3,748   41.35
            Black             |                                                    545   50.18     541   49.82
            Other             |                                                    113   56.50      87   43.50
          Rural               |                                                                              
            Urban             |                                                  3,798   58.00   2,750   42.00
            Rural             |                                                  2,177   57.24   1,626   42.76
          High blood pressure |                                                                              
            0                 |  42.16502   16.77157     24.20231     4.100279                                
            1                 |  54.97281   14.90897     27.36081     5.332119                                
          ----------------------------------------------------------------------------------------------------
          Thank you as always for your continued support!
          Last edited by Nathan Yu; 22 Aug 2024, 20:07.

          Comment


          • #6
            Code:
            help collect get

            Code:
            webuse nhanes2l, clear
            collect clear
            table (var) (race), statistic(fvfrequency highbp) statistic(fvpercent highbp) name(race) nototal
            table (var) (rural), statistic(fvfrequency highbp) statistic(fvpercent highbp) name(rural) nototal
            
            foreach var in age bmi{
                qui sum `var' if !highbp
                collect get fvfrequency = "`:display %9.3f r(mean)'", tags(`var'[] var[0.highbp])
                collect get fvpercent = "`:display %9.3f r(sd)'", tags(`var'[] var[0.highbp])
                qui sum `var' if highbp
                collect get fvfrequency = "`:display %9.3f r(mean)'", tags(`var'[] var[1.highbp])
                collect get fvpercent = "`:display %9.3f r(sd)'", tags(`var'[] var[1.highbp])
            }
            collect combine all= race rural
            collect label levels result fvfrequency "Count / Mean", modify
            collect label levels result fvpercent "% / (SD)", modify
            collect layout (age bmi race rural) (var#result)
            Res.:

            Code:
            . collect layout (age bmi race rural) (var#result)
            
            Collection: all
                  Rows: age bmi race rural
               Columns: var#result
               Table 1: 9 x 4
            
            --------------------------------------------------------------------------
                                  |                 High blood pressure              
                                  |             0                         1          
                                  |  Count / Mean   % / (SD)   Count / Mean   % / (SD)
            ----------------------+---------------------------------------------------
            Age (years)           |        42.165     16.772         54.973     14.909
            Body mass index (BMI) |        24.202      4.100         27.361      5.332
            Race                  |                                                  
              White               |         5,317      58.65          3,748      41.35
              Black               |           545      50.18            541      49.82
              Other               |           113      56.50             87      43.50
            Rural                 |                                                  
              Urban               |         3,798      58.00          2,750      42.00
              Rural               |         2,177      57.24          1,626      42.76
            --------------------------------------------------------------------------
            Last edited by Andrew Musau; 23 Aug 2024, 08:37.

            Comment


            • #7
              Thank you so much Andrew for the continued support! Got it, I will individually tag the scalars then and call them with the layout command!

              Comment


              • #8
                Hi Andrew, this is a very clever method, thank you for teaching it to me. However, I wanted to put the SD's in the table in a parenthesis. How can I go about doing so?

                Normally I would use the following code:

                Code:
                collect style cell ///
                                var[age bmi]#result[fvpercent], ///
                                sformat("(%s)")
                However, I get the following error:

                Code:
                . collect style cell ///
                >                 var[age bmi]#result[fvpercent], ///
                >                 sformat("(%s)")
                (level age of dimension var not found)
                (level bmi of dimension var not found)
                How can I reformat the cells given the method you suggested?

                Apologies if it sounds ambiguous, but I hope to change both the nformat() and sformat() of these collected values after they have already been stored under the result[fvpercent] dimension/level.

                Comment


                • #9
                  Code:
                  collect style cell result[fvpercent] , sformat("(%s)")
                  Res.:

                  Code:
                  . collect preview
                  
                  --------------------------------------------------------------------------
                                        |                 High blood pressure               
                                        |             0                         1           
                                        |  Count / Mean   % / (SD)   Count / Mean   % / (SD)
                  ----------------------+---------------------------------------------------
                  Age (years)           |        42.165   (16.772)         54.973   (14.909)
                  Body mass index (BMI) |        24.202    (4.100)         27.361    (5.332)
                  Race                  |                                                   
                    White               |         5,317    (58.65)          3,748    (41.35)
                    Black               |           545    (50.18)            541    (49.82)
                    Other               |           113    (56.50)             87    (43.50)
                  Rural                 |                                                   
                    Urban               |         3,798    (58.00)          2,750    (42.00)
                    Rural               |         2,177    (57.24)          1,626    (42.76)
                  --------------------------------------------------------------------------
                  
                  .

                  Comment


                  • #10
                    You are actually mixing up standard deviations and percentages in one column, so #9 is not a solution to your problem. Just insert the parentheses in the collect get command lines.

                    Code:
                    webuse nhanes2l, clear
                    collect clear
                    table (var) (race), statistic(fvfrequency highbp) statistic(fvpercent highbp) name(race) nototal
                    table (var) (rural), statistic(fvfrequency highbp) statistic(fvpercent highbp) name(rural) nototal
                    
                    foreach var in age bmi{
                        qui sum `var' if !highbp
                        collect get fvfrequency = "`:display %4.3f r(mean)'", tags(`var'[] var[0.highbp])
                        collect get fvpercent = "(`:display %4.3f r(sd)')", tags(`var'[] var[0.highbp])
                        qui sum `var' if highbp
                        collect get fvfrequency = "`:display %4.3f r(mean)'", tags(`var'[] var[1.highbp])
                        collect get fvpercent = "(`:display %4.3f r(sd)')", tags(`var'[] var[1.highbp])
                    }
                    collect combine all= race rural
                    collect label levels result fvfrequency "Count / Mean", modify
                    collect label levels result fvpercent "% / (SD)", modify
                    collect layout (age bmi race rural) (var#result)
                    Res.:

                    Code:
                    . collect layout (age bmi race rural) (var#result)
                    
                    Collection: all
                          Rows: age bmi race rural
                       Columns: var#result
                       Table 1: 9 x 4
                    
                    --------------------------------------------------------------------------
                                          |                 High blood pressure               
                                          |             0                         1           
                                          |  Count / Mean   % / (SD)   Count / Mean   % / (SD)
                    ----------------------+---------------------------------------------------
                    Age (years)           |        42.165   (16.772)         54.973   (14.909)
                    Body mass index (BMI) |        24.202    (4.100)         27.361    (5.332)
                    Race                  |                                                   
                      White               |         5,317      58.65          3,748      41.35
                      Black               |           545      50.18            541      49.82
                      Other               |           113      56.50             87      43.50
                    Rural                 |                                                   
                      Urban               |         3,798      58.00          2,750      42.00
                      Rural               |         2,177      57.24          1,626      42.76
                    --------------------------------------------------------------------------

                    Comment


                    • #11
                      Hi Andrew,

                      Thank you so much for this -- excellent! I had originally had trouble attempting exactly this, however from your code I realized that that I needed to change the number-width of the format (from %9.3f to %4.3f) that is being applied to the numbers being collected to make sure spacing is correctly aligned:

                      Code:
                      collect get fvpercent = "(`:display %9.3f r(sd)')", tags(`var'[] var[0.highbp])
                      vs the following:

                      Code:
                      collect get fvpercent = "(`:display %4.3f r(sd)')", tags(`var'[] var[0.highbp])

                      From curiosity, would the display command in 'collect get fvpercent = "(`:display %4.3f r(sd)')", tags(`var'[] var[0.highbp])' be able to also take other additional formatting options, such as below sformat option below that adds a percentage sign to the collected number?

                      Code:
                      collect style cell result[fvpercent], sformat("%s%%")
                      Thank you as always, and please do not spend time on this if no obvious method comes to your mind!
                      Last edited by Nathan Yu; 24 Sep 2024, 17:23.

                      Comment


                      • #12
                        Same way you inserted the parentheses, you can insert other characters.

                        Code:
                        webuse nhanes2l, clear
                        collect clear
                        table (var) (race), statistic(fvfrequency highbp) statistic(fvpercent highbp) name(race) nototal
                        table (var) (rural), statistic(fvfrequency highbp) statistic(fvpercent highbp) name(rural) nototal
                        
                        foreach var in age bmi{
                            qui sum `var' if !highbp
                            collect get fvfrequency = "`:display %4.3f r(mean)'", tags(`var'[] var[0.highbp])
                            collect get fvpercent = "(`:display %4.3f r(sd)'%)", tags(`var'[] var[0.highbp])
                            qui sum `var' if highbp
                            collect get fvfrequency = "`:display %4.3f r(mean)'", tags(`var'[] var[1.highbp])
                            collect get fvpercent = "(`:display %4.3f r(sd)'%)", tags(`var'[] var[1.highbp])
                        }
                        collect combine all= race rural
                        collect label levels result fvfrequency "Count / Mean", modify
                        collect label levels result fvpercent "% / (SD)", modify
                        collect layout (age bmi race rural) (var#result)
                        Res.:

                        Code:
                        . collect layout (age bmi race rural) (var#result)
                        
                        Collection: all
                              Rows: age bmi race rural
                           Columns: var#result
                           Table 1: 9 x 4
                        
                        ----------------------------------------------------------------------------
                                              |                  High blood pressure                
                                              |              0                          1           
                                              |  Count / Mean    % / (SD)   Count / Mean    % / (SD)
                        ----------------------+-----------------------------------------------------
                        Age (years)           |        42.165   (16.772%)         54.973   (14.909%)
                        Body mass index (BMI) |        24.202    (4.100%)         27.361    (5.332%)
                        Race                  |                                                     
                          White               |         5,317       58.65          3,748       41.35
                          Black               |           545       50.18            541       49.82
                          Other               |           113       56.50             87       43.50
                        Rural                 |                                                     
                          Urban               |         3,798       58.00          2,750       42.00
                          Rural               |         2,177       57.24          1,626       42.76
                        ----------------------------------------------------------------------------

                        Comment


                        • #13
                          Hello Andrew, my apologies for the late reply. This is excellent, thank you very much for this!

                          Comment


                          • #14
                            Issue with using Collect commands to format and Display percentages in paranthesis

                            Hi Andrew Musau - I have a question and I am hoping you may help me to figure out what I am not doing right.
                            I am using Stata 18.5 and when I execute the commands below, I get a table where the means and SD are formatted, and SDs put in parenthesis as desired. However, the frequencies and corresponding percentages aren't formatted. Besides, the percentages are not in parenthesis. Is there anything I am missing? Thank you in advance Prof. Musau.


                            collect clear
                            table (var) (highbp), nototals ///
                            statistic(mean age bmi) ////
                            statistic(sd age bmi) ///
                            statistic(fvfrequency sex race hlthstat) ///
                            statistic(fvpercent sex race hlthstat) ///
                            statistic(mean tcresult tgresult hdresult) ///
                            statistic(sd tcresult tgresult hdresult) ///
                            nformat(%9.1f mean sd fvpercent) ///
                            sformat("(%s)" sd) ///
                            nformat(%9.0fc fvfrequency) ////
                            sformat("%s%%" fvpercent)


                            collect style cell border_block, border(right, pattern(nil))
                            collect preview
                            collect label levels highbp 0 "No" 1 "Yes"
                            collect label list highbp, all

                            collect label list result, all
                            collect recode result fvfrequency=column1 ///
                            fvpercent=column2 ///
                            mean=column1 ///
                            sd=column2

                            collect layout (var) (highbp#result[column1 column2])


                            collect style cell var[sex race hlthstat]#result[column2], ///
                            nformat(%6.1f) sformat("%s%%")
                            collect style cell ///
                            var[age bmi tcresult tgresult hdresult]#result[column1 column2], ///
                            nformat(%6.1f)
                            collect style cell ///
                            var[age bmi tcresult tgresult hdresult]#result[column2], ///
                            sformat("(%s)")
                            collect preview



                            I should note when I execute the command below, I get the error message in red text below it.

                            collect style cell var[sex race hlthstat]#result[column2], ///
                            nformat(%6.1f) sformat("%s%%"

                            (level hlthstat of dimension var not found)
                            (level race of dimension var not found)
                            (level sex of dimension var not found)



                            *Resultant Table
                            --------------------------------------------------------------------
                            High blood pressure
                            No Yes
                            column1 column2 column1 column2
                            --------------------------------------------------------------------
                            Age (years) 42.2 (16.8) 55.0 (14.9)
                            Body mass index (BMI) 24.2 (4.1) 27.4 (5.3)
                            Sex=Male 2611 43.69874 2304 52.65082
                            Sex=Female 3364 56.30126 2072 47.34918
                            Race=White 5317 88.98745 3748 85.64899
                            Race=Black 545 9.121339 541 12.36289
                            Race=Other 113 1.891213 87 1.988117
                            Health status=Excellent 1649 27.65387 758 17.3376
                            Health status=Very good 1666 27.93896 925 21.15737
                            Health status=Good 1572 26.36257 1366 31.24428
                            Health status=Fair 766 12.84588 904 20.67704
                            Health status=Poor 310 5.198725 419 9.583715

                            Serum cholesterol (mg/dL) 208.7 (47.3) 229.9 (49.6)
                            Serum triglycerides (mg/dL) 129.2 (83.9) 166.0 (109.2)
                            High-density lipids (mg/dL) 49.9 (14.1) 49.2 (14.5)
                            --------------------------------------------------------------------





                            Comment


                            • #15
                              Try these edits to the lines where you style the sex, race and health numbers, and where you specify the layout:

                              Code:
                              collect layout (var[age bmi tcresult tgresult hdresult] sex race hlthstat) (highbp#result[column1 column2])
                              
                              collect style cell (sex race hlthstat)#result[column2], ///
                                  nformat(%6.1f) sformat("(%s%%)")
                              which then produces:
                              Code:
                              . collect preview
                              
                              ------------------------------------------------------------------
                                                                    High blood pressure        
                                                                   No                 Yes      
                                                           column1   column2   column1   column2
                              ------------------------------------------------------------------
                              Age (years)                     42.2    (16.8)      55.0    (14.9)
                              Body mass index (BMI)           24.2     (4.1)      27.4     (5.3)
                              Serum cholesterol (mg/dL)      208.7    (47.3)     229.9    (49.6)
                              Serum triglycerides (mg/dL)    129.2    (83.9)     166.0   (109.2)
                              High-density lipids (mg/dL)     49.9    (14.1)      49.2    (14.5)
                              Sex                                                              
                                Male                          2611   (43.7%)      2304   (52.7%)
                                Female                        3364   (56.3%)      2072   (47.3%)
                              Race                                                              
                                White                         5317   (89.0%)      3748   (85.6%)
                                Black                          545    (9.1%)       541   (12.4%)
                                Other                          113    (1.9%)        87    (2.0%)
                              Health status                                                    
                                Excellent                     1649   (27.7%)       758   (17.3%)
                                Very good                     1666   (27.9%)       925   (21.2%)
                                Good                          1572   (26.4%)      1366   (31.2%)
                                Fair                           766   (12.8%)       904   (20.7%)
                                Poor                           310    (5.2%)       419    (9.6%)
                              ------------------------------------------------------------------
                              Last edited by Hemanshu Kumar; 27 Jun 2025, 16:17.

                              Comment

                              Working...
                              X