Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using table and collect to obtain multiple columns of means and sd of 2+ continuous variables

    Hi Everyone,
    I've not really tried to use the "newish" table and collect commands to make reproducible tables. Now I have the need I am are not quite getting there hence need some advice.
    I want to produce a table where the rows are made up of a number of categorical variables and the columns represent 3 to 4 continuous variables with each cell in the table a summary statistic - mean and SD for example.
    Below shows the beginning of such a table with 2 columns showing the the mean and sd of systolic BP according to rows of 3 different categorical variables using a Stata dataset.
    I am using Stata version 18.
    Code:
    webuse nhanes2, clear
    collect clear
    table sex,  statistic(mean bpsystol) ///
        statistic(sd bpsystol)  nototals 
    table race,  statistic(mean bpsystol) ///
        statistic(sd bpsystol)  nototals   append
    table hlthstat,  statistic(mean bpsystol) ///
        statistic(sd bpsystol)  nototals   append
    collect label levels result count "N"  mean "Mean"  sd "SD" , modify
    collect layout (sex race hlthstat) (result)
    Resulting Output (abbreviated to show only last part)
    Code:
    collect layout (sex race hlthstat) (result)
    
    Collection: Table
          Rows: sex race hlthstat
       Columns: result
       Table 1: 14 x 2
    
    ---------------------------------------------
                           |      Mean         SD
    -----------------------+---------------------
    Sex                    |                     
      Male                 |  132.8877   20.99274
      Female               |  129.0679   25.12684
    Race                   |                     
      White                |  130.5819   22.81277
      Black                |  133.7753   26.41772
      Other                |   128.755    27.4119
    Health status          |                     
      Excellent            |  124.3191   19.66825
      Very good            |  127.3007   21.60186
      Good                 |   132.354   22.88047
      Fair                 |  139.0383    26.0577
      Poor                 |  140.7421   26.06442
      Blank but applicable |  130.2143   15.93341
    ---------------------------------------------
    So far so good. I now want to add an additional 2 columns for another continuous variable, in the case iron, summarised by the same categorical variables.
    Change to code in bold. Ultimately I'll be adding another 4-5 variables.
    Code:
    webuse nhanes2, clear
    collect clear
    table sex,  statistic(mean bpsystol iron) ///
        statistic(sd bpsystol iron )  nototals 
    table race,  statistic(mean bpsystol iron) ///
        statistic(sd bpsystol iron)  nototals   append
    table hlthstat,  statistic(mean bpsystol iron ) ///
        statistic(sd bpsystol iron )  nototals   append
    collect label levels result count "N"  mean "Mean"  sd "SD" , modify
    collect layout (sex race hlthstat) (result)
    Which results in the ourput below.

    Code:
    . collect clear
    
    . table sex,  statistic(mean bpsystol iron) ///
    >     statistic(sd bpsystol iron )  nototals 
    
    ---------------------------------------------------------------------------------------------------------
             |                       Mean                                     Standard deviation             
             |  Systolic blood pressure   Serum iron (mcg/dL)   Systolic blood pressure   Serum iron (mcg/dL)
    ---------+-----------------------------------------------------------------------------------------------
    Sex      |                                                                                               
      Male   |                 132.8877              103.6798                  20.99274              33.82182
      Female |                 129.0679              95.61792                  25.12684              33.86747
    ---------------------------------------------------------------------------------------------------------
    
    . table race,  statistic(mean bpsystol iron) ///
    >     statistic(sd bpsystol iron)  nototals   append
    
    --------------------------------------------------------------------------------------------------------
            |                       Mean                                     Standard deviation             
            |  Systolic blood pressure   Serum iron (mcg/dL)   Systolic blood pressure   Serum iron (mcg/dL)
    --------+-----------------------------------------------------------------------------------------------
    Race    |                                                                                               
      White |                 130.5819              100.2339                  22.81277              33.75215
      Black |                 133.7753              92.17956                  26.41772              35.25771
      Other |                  128.755                103.19                   27.4119              37.68521
    --------------------------------------------------------------------------------------------------------
    
    . table hlthstat,  statistic(mean bpsystol iron ) ///
    >     statistic(sd bpsystol iron )  nototals   append
    
    -----------------------------------------------------------------------------------------------------------------------
                           |                       Mean                                     Standard deviation             
                           |  Systolic blood pressure   Serum iron (mcg/dL)   Systolic blood pressure   Serum iron (mcg/dL)
    -----------------------+-----------------------------------------------------------------------------------------------
    Health status          |                                                                                               
      Excellent            |                 124.3191              102.7848                  19.66825              34.37272
      Very good            |                 127.3007              101.6129                  21.60186              34.83534
      Good                 |                  132.354              99.19571                  22.88047               33.4414
      Fair                 |                 139.0383              95.03533                   26.0577              33.11717
      Poor                 |                 140.7421              91.92318                  26.06442               32.8804
      Blank but applicable |                 130.2143              95.14286                  15.93341              37.67239
    -----------------------------------------------------------------------------------------------------------------------
    
    .     
    . collect label levels result count "N"  mean "Mean"  sd "SD" , modify
    
    . 
    . collect layout (sex race hlthstat) (result)
    
    Collection: Table
          Rows: sex race hlthstat
       Columns: result
    
    Your layout specification does not uniquely match any items. One or more of the following dimensions might help uniquely match
    items: cmdset, colname, statcmd, var.
    So as you can see the collect layout command fails and I just can't figure out how to edit the last line to get what I want if indeed that is where the issue is.
    Ideally I'd like the mean and sd to go together for each variable rather than all the means and then all the SDs.
    Many thanks for any advice here.
    Kevan

  • #2
    Here is one approach:

    Code:
    webuse nhanes2, clear
    collect clear
    local cvars age weight bpsystol
    foreach var of local cvars{
        collect create `var', replace
        table sex,  statistic(mean `var' ) ///
        statistic(sd `var')  nototals name(`var')
        table race,  statistic(mean `var') ///
        statistic(sd `var')  nototals name(`var')  append
        table hlthstat,  statistic(mean `var') ///
        statistic(sd `var') nototals name(`var')  append 
    }
    collect combine all = `cvars'
    collect label levels result count "N"  mean "Mean"  sd "SD" , modify
    collect layout (sex race hlthstat) (var#result)
    Res.:

    Code:
    . collect layout (sex race hlthstat) (var#result)
    
    Collection: all
          Rows: sex race hlthstat
       Columns: var#result
       Table 1: 14 x 6
    
    -----------------------------------------------------------------------------------------------
                           |      Age (years)        Systolic blood pressure        Weight (kg)    
                           |      Mean         SD          Mean            SD       Mean         SD
    -----------------------+-----------------------------------------------------------------------
    Sex                    |                                                                       
      Male                 |   47.4238    17.1683      132.8877      20.99274   77.98423   13.63785
      Female               |  47.72057   17.25716      129.0679      25.12684   66.39418   14.73496
    Race                   |                                                                       
      White                |  47.85163   17.16697      130.5819      22.81277   71.70766   15.09552
      Black                |  45.95212   17.44753      133.7753      26.41772   75.09187   16.92878
      Other                |     44.09   17.32723       128.755       27.4119   63.15765   13.75454
    Health status          |                                                                       
      Excellent            |  40.14001   15.95492      124.3191      19.66825   71.09247   14.39096
      Very good            |  43.50058   17.18107      127.3007      21.60186    71.6522   15.29464
      Good                 |  49.14602   16.57893       132.354      22.88047   71.91869   15.31037
      Fair                 |  56.37246   14.65183      139.0383       26.0577   72.89946   16.13797
      Poor                 |   59.9904   11.47925      140.7421      26.06442    73.1265   16.78043
      Blank but applicable |  59.35714    10.7818      130.2143      15.93341   69.41643    16.6843
    -----------------------------------------------------------------------------------------------

    Comment


    • #3
      The key point in Andrew Musau's post is that you needed to add var to the column specification in the layout.

      In the following my goal was to provide reusable code that only depends on a couple of variable lists defined in local macros.
      Code:
      * data specific
      webuse nhanes2
      unab catvars : sex race hlthstat
      unab convars : age bpsystol weight
      
      * reusable code
      local append
      foreach catvar of local catvars {
          table `catvar', ///
              statistic(mean `convars') ///
              statistic(sd `convars') ///
              nototals ///
              `append'
          local append append
      }
      collect label levels result count "N" mean "Mean" sd "SD", modify
      collect layout (`catvars') (var#result)
      Here is the resulting table.
      Code:
      -----------------------------------------------------------------------------------------------
                             |      Age (years)        Systolic blood pressure        Weight (kg)
                             |      Mean         SD          Mean            SD       Mean         SD
      -----------------------+-----------------------------------------------------------------------
      Sex                    |
        Male                 |   47.4238    17.1683      132.8877      20.99274   77.98423   13.63785
        Female               |  47.72057   17.25716      129.0679      25.12684   66.39418   14.73496
      Race                   |
        White                |  47.85163   17.16697      130.5819      22.81277   71.70766   15.09552
        Black                |  45.95212   17.44753      133.7753      26.41772   75.09187   16.92878
        Other                |     44.09   17.32723       128.755       27.4119   63.15765   13.75454
      Health status          |
        Excellent            |  40.14001   15.95492      124.3191      19.66825   71.09247   14.39096
        Very good            |  43.50058   17.18107      127.3007      21.60186    71.6522   15.29464
        Good                 |  49.14602   16.57893       132.354      22.88047   71.91869   15.31037
        Fair                 |  56.37246   14.65183      139.0383       26.0577   72.89946   16.13797
        Poor                 |   59.9904   11.47925      140.7421      26.06442    73.1265   16.78043
        Blank but applicable |  59.35714    10.7818      130.2143      15.93341   69.41643    16.6843
      -----------------------------------------------------------------------------------------------
      If you do this frequently, I recommend you put this code a program and save it to an ado-file, like the following program I saved to my_twoway_table.ado.
      Code:
      *! version 1.0.0  02jul2025
      program my_twoway_table
          version 18
          syntax varlist, by(varlist)
      
          local append
          foreach catvar of local by {
              quietly ///
              table `catvar', ///
                  statistic(mean `varlist') ///
                  statistic(sd `varlist') ///
                  nototals ///
                  `append'
              local append append
          }
          collect label levels result count "N" mean "Mean" sd "SD", modify
          collect layout (`by') (var#result)
      end
      Then you can get your table with a simple call to your program.
      Code:
      webuse nhanes2
      
      my_twoway_table age bpsystol weight, by(sex race hlthstat)

      Comment


      • #4
        Dear Andrew & Jeff,
        Many thanks for providing these solutions - will try them later today but they appear exactly what I needed. Extra thanks to Jeff for providing the sample programme. Very helpful to me and I’m sure others as well.
        Kevan

        Comment

        Working...
        X