Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Table collection not including one result column

    I'm trying to create a table of percentiles of survival time by different patient characteristics. I want to include the number of subjects, median, first quartile, and third quartile in the table. I've used the stci command (Stata/SE 19.0) to calculate the quartiles. stci returns scalars rather than matrices, so I've been using collect with by to capture each piece of data. When I specify the table, it displays the quartiles, but not the number of subjects:

    Code:
    use https://www.stata-press.com/data/r19/page2
    stset, noshow
    
    label define group 1 "Male" 2 "Female"
    label values group group
    
    collect clear
    bysort group: collect : stci, median
    bysort group: collect : stci, p(25)
    bysort group: collect : stci, p(75)
    
    collect layout (group) (result[N_sub p50 p25 p75]) (), name(default)
    collect label levels result N_sub "N" p50 "Median" p25 "1Q" p75 "3Q", modify
    collect preview

    That result provides me this table, where N does not appear. I don't get any error messages.
    Median 1Q 3Q
    Male 216 190 234
    Female 233 232 280




    Note that I can also get N_sub to show up in the table if I include a single measure--for example, just the median--for two different groups:

    Code:
    gen random=runiform()
    gen group2=random<.5
    label define group2 0 "Young" 1 "Old"
    label values group2 group2
    collect clear
    bysort group: collect : stci, median
    bysort group2: collect : stci, median
    
    collect layout (group group2) (result[N_sub p50 ]) (), name(default)
    collect label levels result N_sub "N" p50 "Median" , modify
    collect preview

    N Median
    Male 19 216
    Female 21 233
    Young 22 233
    Old 18 209


    I suspect something is going wrong over the fact that each group level has its N_sub generated 3 times, once for each quartile (though the N's should be the same), but I'm not sure how to fix it.
    Any ideas on how to get my N_sub to show up in the table along with all three quartiles?




  • #2
    Originally posted by Molly Jeffery View Post

    I suspect something is going wrong over the fact that each group level has its N_sub generated 3 times, once for each quartile (though the N's should be the same), but I'm not sure how to fix it.
    Correct. This suggests that there's a missing dimension in your layout specification—namely, cmdset, as the statistics are compiled for each collect call. If we add this, we get:

    Code:
    use https://www.stata-press.com/data/r19/page2, clear
    stset, noshow
    label define group 1 "Male" 2 "Female"
    label values group group
    collect clear
    bysort group: collect : stci, median
    bysort group: collect : stci, p(25)
    bysort group: collect : stci, p(75)
    collect layout (cmdset#group) (result[N_sub p50 p25 p75]) (), name(default)
    Res.:

    Code:
    . collect layout (cmdset#group) (result[N_sub p50 p25 p75]) (), name(default)
    
    Collection: default
          Rows: cmdset#group
       Columns: result[N_sub p50 p25 p75]
       Table 1: 12 x 4
    
    -----------------------------------------
             | Number of subjects p50 p25 p75
    ---------+-------------------------------
    1        |                              
      Male   |                 19 216        
    2        |                              
      Female |                 21 233        
    3        |                              
      Male   |                 19     190    
    4        |                              
      Female |                 21     232    
    5        |                              
      Male   |                 19         234
    6        |                              
      Female |                 21         280
    -----------------------------------------

    Now we need to modify the alignment and clean up (e.g., hiding levels of the dimension cmdset). All in all:

    Code:
    use https://www.stata-press.com/data/r19/page2, clear
    stset, noshow
    label define group 1 "Male" 2 "Female"
    label values group group
    collect clear
    bysort group: collect : stci, median
    bysort group: collect : stci, p(25)
    bysort group: collect : stci, p(75)
    collect layout (cmdset#group) (result[N_sub p50 p25 p75]) (), name(default)
    collect label levels result N_sub "N" p50 "Median" p25 "1Q" p75 "3Q", modify
    collect style header cmdset, level(hide)
    collect remap cmdset[3 5] = cmdset[1 1], fortags(group[])
    collect remap cmdset[4 6] = cmdset[2 2], fortags(group[])
    collect preview
    Res.:

    Code:
    .
    . collect preview
    
    --------------------------
           |  N Median  1Q  3Q
    -------+------------------
    Male   | 19    216 190 234
    Female | 21    233 232 280
    --------------------------

    Comment


    • #3
      You can achieve this much more simply by using stsum instead of stci. Consider:
      Code:
      use https://www.stata-press.com/data/r19/page2, clear
      stset, noshow
      label define group 1 "Male" 2 "Female"
      label values group group
      
      collect clear
      bysort group: collect : stsum
      collect label levels result N_sub "N" p50 "Median" p25 "1Q" p75 "3Q", modify
      collect layout (group) (result[N_sub p50 p25 p75])
      which produces:
      Code:
      . collect preview
      
      --------------------------
             |  N Median  1Q  3Q
      -------+------------------
      Male   | 19    216 190 234
      Female | 21    233 232 280
      --------------------------
      Last edited by Hemanshu Kumar; 12 Aug 2025, 00:25.

      Comment


      • #4
        Brilliant--thank you both! Andrew provided the general answer and Hemanshu the specific. I really appreciate it.

        Comment

        Working...
        X