Tabulate, Summarize and Significance Tests (Or Alternatives?)

Kevin Blaine

Join Date: Jul 2017
Posts: 66

Tabulate, Summarize and Significance Tests (Or Alternatives?)

22 Jul 2024, 13:02

Hi everyone -- I've searched around, and I'm still having trouble understanding whether what I want to do is: a) possible, and b) statisically viable/worthy it.

Before, I begin: dataex to the rescue! (FYI -- data are long.)

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input int studyid byte(timepoint gender total_score)
117 1 1  75
116 2 1  60
117 2 1  90
216 1 1  80
115 2 1  60
230 2 1  65
136 2 1  80
115 1 1  70
116 1 1  70
152 2 1  65
106 2 1  85
152 1 1  50
230 1 1  55
216 2 1  90
136 1 1  80
106 1 1  85
109 1 2  70
255 1 2  50
226 1 2  50
140 2 2  85
114 2 2  95
129 1 2  55
144 1 2  55
138 2 2  70
113 1 2  70
207 2 2  70
210 1 2  90
142 2 2  80
119 2 2  55
139 2 2  90
132 1 2  40
248 2 2  80
134 1 2  95
217 2 2  85
111 2 2  90
119 1 2  30
238 1 2  85
120 2 2  65
101 1 2  65
156 1 2  80
105 2 2  95
130 1 2  75
107 1 2  50
123 2 2  95
104 1 2  75
228 2 2  80
125 2 2  95
245 2 2  80
238 2 2  90
208 2 2  80
219 2 2  55
241 2 2  85
164 1 2  65
232 1 2  90
209 1 2  80
253 2 2  70
231 1 2  80
207 1 2  70
110 1 2  80
250 1 2  70
147 1 2  60
112 1 2  55
147 2 2  75
137 1 2  75
143 1 2  60
211 2 2  85
127 1 2  85
139 1 2  90
154 1 2  75
160 1 2  35
134 2 2  95
253 1 2  55
222 1 2  40
146 2 2  85
163 1 2  30
237 1 2  65
130 2 2  85
209 2 2  80
252 2 2  70
105 1 2  80
146 1 2  70
110 2 2  85
118 2 2 100
148 2 2  85
104 2 2  90
135 1 2  65
122 1 2  85
131 1 2  70
208 1 2  70
252 1 2  60
222 2 2  60
150 1 2  80
165 2 2  90
145 1 2  95
143 2 2  85
133 2 2  95
124 1 2  90
223 1 2  75
163 2 2  65
127 2 2  95
end
label values timepoint la_survey
label def la_survey 1 "Pre", modify
label def la_survey 2 "Post", modify
label values gender la_gender
label def la_gender 1 "Male", modify
label def la_gender 2 "Female", modify

Now, let's start with what I WANT a table to look like, which is more or less what dtable would produce, except I want a third variable in the cells, like this:

Code:

----------------------------------------------------
                |              Timepoint            
                |             Pre               Post
----------------+-----------------------------------
Gender (Female) |                                   
  Male          |  72.5 [62.5-80]   72.5 [62.5-87.5]
  Female        |      70 [55-80]         85 [75-90]
----------------------------------------------------

Here's the code I used to produce the above table:

Code:

        
table gender timepoint, statistic(p50 total_score) statistic(p25 total_score) statistic(p75 total_score)
        collect composite define inter = p25 p75, delimiter("-") trim
        collect style cell result[inter], sformat("[%s]") 
        collect composite define med_inter = p50 inter , delimiter(" ") trim
        collect layout (gender[1 2]) (timepoint[1 2]#result[med_inter])
        collect style header result[med_inter], level(hide)
        collect preview

Now, I know I can do something similar to this using tabulate, summarize(), but a) I don't have any control over the statistics reported with that, and b) I can't do a test of association with tab 2x2 table.

So, what is missing is a test of association, much like you'd find in dtable. (Obviously, not on the composite result, but on the median only.) However, in this case it obviously won't be a chi square -- though with this particular variable (ie. total_score), it could be, as it's a percentage multiplied by 100 to get an integer in each cell.

Back to my two questions:

1) Is it possible to do this for an entire table (e.g. Table 1) while including a test of significant for each row variable?
2) Is there a better way of showing these higher order associations pre vs post? I plan on creating an adjusted model using xtreg, but I'd love to be able to show the investigator the unadjusted total score differences with some unadjusted association testing too.

Thanks!

Tags: None

Kevin Blaine

Join Date: Jul 2017

Posts: 66
#2

22 Jul 2024, 13:16

Also, I did find this, which is tangentially related and the closest thing I've come across that gets close to what I'm looking for: https://www.statalist.org/forums/for...is-implemented

Clyde Schechter -- thoughts?

Last edited by Kevin Blaine; 22 Jul 2024, 13:18.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#3

22 Jul 2024, 16:08

With regard to producing a table that looks like you want, the code in #1 shows that you are far more advanced in your use of -collect- than I am. Nevertheless, let me point out that -dtable- has a -continuous()- and a -factor()- option, each of which in turn allows a -test()- option that enables you to specify what test you want to use to compare the -by()- groups. I refer you to the help file and reference manuals for the details. I think the use of -test(kwallis)- for the total_score variable would give you more or less what you want here. I think this will enable you to get the table you are looking for, though I haven't specifically tried to produce what you are looking for.

If that will not do it, I can tell you that when confronted with the need for a publication-quality table that I don't know how to get out of -table-, -dtable- or -etable-, perhaps with very minor tweaks from -collect- (a command whose complexities continue to elude my grasp), I simply create a data set that is organized like the table I want to create (which usually means using -egen- functions or -collapse- or both) and then create a document with -putdocx table- from that.
Comment
Kevin Blaine

Join Date: Jul 2017

Posts: 66
#4

22 Jul 2024, 16:42

Thanks Clyde. I know that dtable has a kwallis option for test, but it doesn't have the 3rd variable summarize command, which in my case is essential. Jeff Pitblado (StataCorp) -- I can across this post of yours, and I'm wondering if there's anything I could repurpose for my use: https://www.statalist.org/forums/for...d-ranksum-test. Thoughts?
Comment
Kevin Blaine

Join Date: Jul 2017

Posts: 66
#5

22 Jul 2024, 17:35

Jeff Pitblado (StataCorp) -- I also found this thread too and am wondering if there's anything to the remap command here in my case: https://www.statalist.org/forums/for...d-be-matchable
Comment

Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014
Posts: 697

22 Jul 2024, 19:38

Here is how you might do this given your example.

Code:

* Kevin's original code, except I use option -nototals- since it appears
* you do not want to see them in your table
table gender timepoint, ///
    statistic(p50 total_score) ///
    statistic(p25 total_score) ///
    statistic(p75 total_score) ///
    nototals
collect composite define inter = p25 p75, delimiter("-") trim
collect style cell result[inter], sformat("[%s]") 
collect composite define med_inter = p50 inter , delimiter(" ") trim
collect layout (gender[1 2]) (timepoint[1 2]#result[med_inter])
collect style header result[med_inter], level(hide)
collect preview

* perform and collect test results within each level of -gender-;
* using -kwallis- just to show how to add p-values to your table since
* it was used in another referenced thread

* note the arguments in the -tags()- option of -collect get-

kwallis total_score if gender==1, by(timepoint)
collect get p = chi2tail(r(df), r(chi2_adj)), tags(gender[1] timepoint[_hide])
kwallis total_score if gender==2, by(timepoint)
collect get p = chi2tail(r(df), r(chi2_adj)), tags(gender[2] timepoint[_hide])

* format the new p-value result
collect style cell result[p], nformat(%5.4f) minimum(.0001)

* update the layout
collect layout (gender) (timepoint#result[med_inter p])

Here is the resulting table.

Code:

-------------------------------------------------------
         |                   timepoint
         |             Pre               Post         p
---------+---------------------------------------------
gender   |
  Male   |  72.5 [62.5-80]   72.5 [62.5-87.5]    0.5609
  Female |      70 [55-80]         85 [75-90]   <0.0001
-------------------------------------------------------

If you have more row variables for your table, form a loop for the variables, or form custom test/collect code blocks for each variable. I would build a table for each row variable separately, following this formula, then use collect combine, keep the column specification but put the row variables in the order of interest in the row specification of your collect layout.

Comment

Kevin Blaine

Join Date: Jul 2017

Posts: 66
#7

23 Jul 2024, 10:02

Jeff -- to the rescue, as always! Trying this solution out today. Thank you!

EDIT: One question: kwallis is a preferred option to ranksum given the nonparametric pattern of data across multiple categories (not just two). I noticed your code limits kwallis by the if modifier. I tweaked my code to an outcome that essentially negates the timepoint variable (i.e. score difference) and added it to all timepoint=1 observations. My new code is this:

Code:

table gender if timepoint==1, statistic(p50 diff_score) statistic(p25 diff_score) statistic(p75 diff_score) kwallis diff_score if timepoint==1, by(gender)

This code, in theory, should give me a test for all four cells vs just two. But I'm having trouble combining the two tables (medians + p-value). I'm wanting a simple statistical test of "Is there a difference between how you perform on the post-test depending on whether you a male vs female?" That is, I'm looking for a combination of the gender variables across time, not comparing one gender over time.

Any advice?

Last edited by Kevin Blaine; 23 Jul 2024, 10:27.
Comment

Kevin Blaine

Join Date: Jul 2017
Posts: 66

23 Jul 2024, 10:26

FYI, this is the table I'm looking to create, with the correct p-value:

Code:

kwallis diff_score if timepoint==1, by(gender)

Kruskal–Wallis equality-of-populations rank test

  +-------------------------+
  | gender | Obs | Rank sum |
  |--------+-----+----------|
  |   Male |   8 |   227.00 |
  | Female |  89 |  4526.00 |
  +-------------------------+

  chi2(1) =  4.682
     Prob = 0.0305

  chi2(1) with ties =  4.761
               Prob = 0.0291

....SOME FANCY CODING HERE TO PRODUCE THE TABLE BELOW.....

-------------------------------------------------------
         |                   timepoint
         |             Pre               Post         p
---------+---------------------------------------------
gender   |
  Male   |  72.5 [62.5-80]   72.5 [62.5-87.5]    0.0291
  Female |      70 [55-80]         85 [75-90]   
-------------------------------------------------------

I'm aware the table above combines both the "diff_score" variable (for use in the p-value calculation) and the "total_score" variable (for the median score). Keeping it "all in the family", this is what the "diff_score" table would look like:

Code:

--------------------------------------
                 Median [IQR]        p
--------------------------------------
Gender (Female)              
  Male            5 [-5-12.5]   0.0291
  Female            15 [5-25]
--------------------------------------

Announcement

Tabulate, Summarize and Significance Tests (Or Alternatives?)

Comment

Comment

Comment

Comment

Comment

Comment

Comment