Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Formatted IQR using Collect

    Hi,

    I am using the excellent Example 3 in the "Stata Customizable Tables" manual to help me build a table with frequency (percent) for categorical variables, and mean (sd) for continuous variables. Some of my continuous variables are, however, very skewed (age data for infants). For those variables I would like to report age in months as median (IQR). I would like to format the IQR as (p25 – p75). I think that what I need to do is combine both p25 and p75 into a single level of the dimension result, but I'm not sure how to do that.

    I am aware that
    Code:
    table1_mc
    does this. I have used and loved table1_mc, but I really want to learn the collect system.

    For this post, below are data for age in months and sex. What I'm after is:
    Male Female
    Age (months) median (p25 – p75) median (p25 – p75)
    Here are my data for sex and age (months):

    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float age_adm_m byte sex
    15.342916 1
     54.80082 1
     19.74538 1
    75.663246 1
     19.81109 1
     46.45585 1
     32.29569 1
    3.3839836 2
     67.15401 2
    14.554415 1
    69.650925 2
     83.58111 1
    15.014374 1
     46.75154 2
    36.172485 1
    35.876797 1
     7.096509 1
     39.85216 1
     7.523614 1
      43.6961 1
      3.12115 1
     36.99384 2
     55.78645 2
     30.12731 2
     52.46817 2
     3.613963 1
     17.87269 2
    31.507187 2
     30.58727 1
    18.431211 1
     43.63039 2
    15.967146 2
      50.7269 1
    32.492813 2
    16.689938 1
     18.89117 1
     30.45585 2
     3.581109 2
    19.876797 1
     82.95688 1
     71.29363 2
      62.0616 2
     30.45585 1
     51.44969 2
     14.52156 2
    11.498973 1
    1.4784395 2
     28.64887 2
     51.58111 1
     72.24641 2
    31.802876 1
     42.48049 1
    2.1026695 2
     127.5729 1
     40.21355 2
     8.936345 1
     3.876797 2
    30.390144 1
     44.71458 2
     11.17043 1
     10.61191 1
     39.09651 1
     14.52156 2
     78.91581 1
    16.328543 1
     42.21766 1
    11.039015 1
     80.16427 1
    150.70226 2
     3.022587 1
     59.07187 1
     38.40657 1
     57.49487 1
     59.00616 2
     19.58111 2
     2.792608 2
     79.50719 2
    122.71047 2
     92.09035 1
     2.562628 2
     46.02875 1
     95.77002 2
     34.49692 2
     6.702259 1
           48 2
     43.13758 2
    125.40452 2
            . 1
     76.38604 1
    11.334702 1
     43.23614 1
     59.59753 1
     55.88501 1
     6.537988 1
     82.16838 1
     43.00616 1
     54.17659 2
     25.23203 1
      54.2423 1
     17.87269 1
    end
    label values sex sex_lbl
    label def sex_lbl 1 "Male", modify
    label def sex_lbl 2 "Female", modify
    ------------------ copy up to and including the previous line ------------------


    Here is my code for what I'm after using mean and SD:

    Code:
    table (var) (shortsite), statistic(fvfrequency sex) statistic(fvpercent sex) nototals append
    collect style header result, level(hide)
    collect style row stack, nobinder spacer
    collect style cell border_block, border(right, pattern(nil))
    collect layout (sex[1]) (shortsite#result)
    collect style cell result[fvpercent], sformat("(%s)")
    Kind regards,

    Ryan

  • #2
    Please include shortsite in your data extract.

    Comment


    • #3
      My apologies, wrong section of code. The intended part was:

      Code:
      table (var) (sex), statistic(mean age_adm_m) statistic(sd age_adm_m) nototals
      collect style header result, level(hide)
      collect style row stack, nobinder spacer
      collect style cell border_block, border(right, pattern(nil))
      collect layout (var) (sex#result)
      collect style cell result[sd], sformat("(%s)")
      Does anyone know how I can do this with mean (sd)?

      Thanks,
      Ryan

      Comment


      • #4
        Here's something that gives you the median and IQR format you showed in your first post:

        Code:
        table (var) (sex), statistic(p50 age_adm_m) statistic(p25 age_adm_m) statistic(p75 age_adm_m) nototals
        collect style cell result[p25 p50 p75], nformat("%4.1f")
        collect composite define iqr = p25 p75 , delimiter(" - ") trim
        collect style cell result[iqr], sformat("(%s)")
        collect composite define mediqr = p50 iqr, trim
        
        collect style header result, level(hide)
        collect style row stack, nobinder spacer
        collect style cell border_block, border(right, pattern(nil))
        collect layout (var) (sex#result[mediqr])
        which produces:
        Code:
        . collect preview
        
        --------------------------------------------------
                                     sex                  
                                 Male               Female
        --------------------------------------------------
        age_adm_m  32.3 (15.0 - 54.2)   41.7 (16.9 - 60.5)
        --------------------------------------------------
        Last edited by Hemanshu Kumar; 27 Nov 2022, 07:08.

        Comment


        • #5
          Hi Hemanshu,

          Thank you so much! That works very well when the table only contains continuous variables that are being summarised using median (IQR). I should have gone a little further in my initial post though - I am trying to replicate the Example 3 in the "Stata Customizable Tables", which creates a table in which the first results column contains frequencies for categorical variables and means for continuous variables. For the purposes of this discussion, I would like to report patient age in months by sex, reporting both as a continuous variable and a categorical variable (census data and similar researchers often use age brackets "<1 yr" "1-4 years" "5-9 years"... and I would like to replicate that. I would also like to add a totals at the top of each column.

          The relevant data are:

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input byte sex float age_adm_m byte age_adm_cat
          1 15.342916 1
          1  54.80082 1
          1  19.74538 1
          1 75.663246 2
          1  19.81109 1
          1  46.45585 1
          1  32.29569 1
          2 3.3839836 0
          2  67.15401 2
          1 14.554415 1
          2 69.650925 2
          1  83.58111 2
          1 15.014374 1
          2  46.75154 1
          1 36.172485 1
          1 35.876797 1
          1  7.096509 0
          1  39.85216 1
          1  7.523614 0
          1   43.6961 1
          1   3.12115 0
          2  36.99384 1
          2  55.78645 1
          2  30.12731 1
          2  52.46817 1
          1  3.613963 0
          2  17.87269 1
          2 31.507187 1
          1  30.58727 1
          1 18.431211 1
          2  43.63039 1
          2 15.967146 1
          1   50.7269 1
          2 32.492813 1
          1 16.689938 1
          1  18.89117 1
          2  30.45585 1
          2  3.581109 0
          1 19.876797 1
          1  82.95688 2
          2  71.29363 2
          2   62.0616 2
          1  30.45585 1
          2  51.44969 1
          2  14.52156 1
          1 11.498973 0
          2 1.4784395 0
          2  28.64887 1
          1  51.58111 1
          2  72.24641 2
          end
          label values sex sex_lbl
          label def sex_lbl 1 "Male", modify
          label def sex_lbl 2 "Female", modify
          label values age_adm_cat age_cat_lbl
          label def age_cat_lbl 0 "0-1 yr", modify
          label def age_cat_lbl 1 "1-4 yr", modify
          label def age_cat_lbl 2 "5-9 yr", modify
          If I make the table described using mean (SD) rather than median (IQR) then the following works:

          Code:
          table (var) (sex), ///
              statistic(frequency) ///
              statistic(mean age_adm_m) statistic(sd age_adm_m) ///
              statistic(fvfrequency age_adm_cat) statistic(fvpercent age_adm_cat) ///
              nototals
          
          // Give Frequency of categorical vars and mean of continuous vars the same label:
          collect recode result frequency = col1
          collect recode result mean = col1
              collect style cell result[col1]#var[age_adm_m], nformat("%4.1f")
          collect recode result fvfrequency = col1
          
          // Give percent of categorical vars and IQR of continuous vars the same label:
          collect recode result sd = col2
              collect style cell result[col2]#var[age_adm_m], nformat("%4.1f")
          collect recode result fvpercent = col2
              collect style cell result[col2]#var[age_adm_cat], nformat("%4.1f")
          
          // Put mean in brackets:
          collect style cell result[col2]#var[age_adm_m], sformat("(%s)")
          
          // Put % after percent:
          collect style cell result[col2]#var[age_adm_cat], sformat("%s%%")
          
          collect style header result, level(hide)
          collect style row stack, nobinder spacer
          collect style cell border_block, border(right, pattern(nil))
          collect layout (var) (sex#result[col1 col2])
          To produce this:

          Code:
          --------------------------------------------------------
                                                  Sex            
                                          Male           Female  
          --------------------------------------------------------
                                       28              22        
          Age at admission (months)  31.6   (22.8)   38.2   (22.9)
                                                                  
          Age at admission (years)                                
            0-1 yr                      5    17.9%      3    13.6%
            1-4 yr                     20    71.4%     14    63.6%
            5-9 yr                      3    10.7%      5    22.7%
          --------------------------------------------------------
          Now, my re-working of your code for IQR is as follows:

          Code:
          table (var) (sex), ///
              statistic(frequency) ///
              statistic(p50 age_adm_m) statistic(p25 age_adm_m) statistic(p75 age_adm_m) ///
              nototals
          collect style cell result[p25 p50 p75], nformat("%4.1f")
          collect composite define iqr = p25 p75 , delimiter(" - ") trim
          collect style cell result[iqr], sformat("(%s)")
          
          collect style header result, level(hide)
          collect style row stack, nobinder spacer
          collect style cell border_block, border(right, pattern(nil))
          collect layout (var) (sex#result[p50 iqr])
          To give the following (which is very nice):

          Code:
          ----------------------------------------------------------------------
                                                         Sex                    
                                             Male                  Female      
          ----------------------------------------------------------------------
          Age at admission (months)  25.2   (15.2 - 45.1)   34.7   (17.9 - 55.8)
          ----------------------------------------------------------------------
          So, putting those together I attempt the following:

          Code:
          table (var) (sex), ///
              statistic(frequency) ///
              statistic(p50 age_adm_m) statistic(p25 age_adm_m) statistic(p75 age_adm_m) ///
              statistic(fvfrequency age_adm_cat) statistic(fvpercent age_adm_cat) ///
              nototals
          collect style cell result[p25 p50 p75], nformat("%4.1f")
          collect style cell result[fvpercent], nformat("%4.1f")
          
          // Create composite IQR result:
          collect composite define iqr = p25 p75 , delimiter(" - ") trim
          
          // Give Frequency of categorical vars and median of continuous vars the same label:
          collect recode result frequency fvfrequency p50 = col1
          collect recode result p50 = col1
          
          // Give percent of categorical vars and IQR of continuous vars the same label:
          collect recode result fvpercent iqr = col2 // This fails: "composite result iqr not allowed"
          
          // Put IQR in brackets:
          collect style cell result[col2]#var[age_adm_m], sformat("(%s)")
          
          // Put % after percent:
          collect style cell result[col2]#var[age_adm_cat], sformat("%s%%")
          
          collect style header result, level(hide)
          collect style row stack, nobinder spacer
          collect style cell border_block, border(right, pattern(nil))
          collect layout (var) (sex#result[col1 col2])
          Which fails at the line indicated. It seems that when you use collect composite it creates something different from a result, and it cannot be renamed with collect recode. I also tried calling the composite "col2" from the beginning and then recoding the other results to "col2", but Stata throws up the same complaint.

          I have simplified here, the final table that I seek to produce is quite a large table 1 with several continuous and categorical variables. I am starting to think that the approach I am using won't work. If that is the case my next thought would be to create several tables that I then append. If you agree that that is a better option then I would greatly appreciate advice on how to do that.

          I haven't found many threads online discussing the use of collect for producing a "Table 1" like this. Am I trying to use the wrong tool for the job, in which case ought I simply go back to using table1_mc?

          I really appreciate the support. Please let me know if my post could be more helpful.

          Comment


          • #6
            Has anyone had any success looking into this? I’m quite stumped.
            thanks,
            Ryan

            Comment


            • #7
              Hi Ryan,
              I ran into this issue as well and the solution I found is to keep p25 and p75 in separate columns while you add formatting, and then using collect composite define as the final step:
              Code:
              table (var) (sex), ///
                  statistic(frequency) ///
                  statistic(p50 age_adm_m) statistic(p25 age_adm_m) statistic(p75 age_adm_m) ///
                  statistic(fvfrequency age_adm_cat) statistic(fvpercent age_adm_cat) ///
                  nototals
              
              // Give Frequency of categorical vars and median of continuous vars the same label:
              collect recode result frequency=col1 fvfrequency=col1 p50=col1
              
              // Give percent of categorical vars and 25th percentile of continuous vars the same label:
              collect recode result fvpercent=col2 p25=col2
              
              // Recode 75th percentile to a third column:  
              collect recode result p75=col3 
              
              // Format median: 
              collect style cell result[col1]#var[age_adm_m], nformat("%4.1f")
              
              // Add IQR formatting to p25 and p75 separately:
              collect style cell result[col2]#var[age_adm_m], nformat("%4.1f") sformat("(%s -")
              collect style cell result[col3]#var[age_adm_m], nformat("%4.1f") sformat("%s)")
              
              // Put % after percent:
              collect style cell result[col2]#var[age_adm_cat], nformat("%4.1f") sformat("%s%%")
              
              // Create composite result:
              collect composite define col4 = col2 col3
              
              collect style header result, level(hide)
              collect style row stack, nobinder spacer
              collect style cell border_block, border(right, pattern(nil))
              collect layout (var) (sex#result[col1 col4])
              Although the composite define applies to all variables, for the categorical variables col3 is empty so it doesn't make a difference.

              Comment

              Working...
              X