Describing a distribution in a clinical trial

Patrice Poinat

Join Date: Jan 2023

Posts: 2
#1

Describing a distribution in a clinical trial

10 Jan 2023, 08:33

Hello,

I'm a new user to Stata and not very familiar with this environment. I tried to look up in the help guide but was unsuccessful so far ... Hence this very basic question here :

In a medical clinical trial, I have 150 patients, meaning 150 observations
For each patient, I have a variable that identifies them with an anonymous patient number,
They have been included in this trial from may 2016 until june 2019, I have a variable that gives the date of signature of the agreement (date format)
They have been included thanks to 22 centers different around the globe. The variable identifying the center is named "siteid"

I easily got the distribution of the number of patients included / center via this command :
hist siteid, frequency witdh(1)

Now, I would want to get :
- the median of included patient / center and the associated standard deviation
- the median of included patient / year / center and the associated standard deviation

It's quite easily done on Excel, but I'm pretty sure there must be a very easy way to do it on Stata too ! I know it's a very basic question and I apologize for it ; but I'm struggling ... and i really would like to learn it on Stata

Thank you for your help !

Patrice
Tags: None
George Ford

Join Date: Aug 2014

Posts: 3140
#2

10 Jan 2023, 08:42

Try something along these lines. But lots of ways to do it.

by siteid: summ x1, d

tabstat x1 , by(siteid) stats(mean p50 sd N)
1 like
Comment
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1385
#3

10 Jan 2023, 08:43

Welcome to Statalist, Patrice! Please see the Statalist FAQ for suggestions on how to post questions most effectively, especially #12. In the future, please post a short extract of your data using -dataex- to help others help you. And post the exact command you tried, within CODE blocks (the # button on the edit toolbar).

As to your question: you might want to look up

Code:

help tabstat

If you are on Stata 17 (the latest version as of now), you may also want to check out the table command.
1 like
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17706

11 Jan 2023, 05:40

Patrice:
as an aside to previous helpful replies, I find weird that you want the "standard deviation of the median" instead of the interquartile range.
Exploiting George's assist, I'd propose something along the following lines:

Code:

. sysuse auto.dta
(1978 automobile data)


. tabstat price, stat(N mean sd p25 p50 p75 min max) by(foreign)

Summary for variables: price
Group variable: foreign (Car origin)

 foreign |         N      Mean        SD       p25       p50       p75       Min       Max
---------+--------------------------------------------------------------------------------
Domestic |        52  6072.423  3097.104      4184    4782.5      6234      3291     15906
 Foreign |        22  6384.682  2621.915      4499      5759      7140      3748     12990
---------+--------------------------------------------------------------------------------
   Total |        74  6165.257  2949.496      4195    5006.5      6342      3291     15906
------------------------------------------------------------------------------------------

.

Kind regards,
Carlo
(Stata 19.0)

Comment

Joseph Coveney

Join Date: Apr 2014

Posts: 4403
#5

11 Jan 2023, 06:54

Originally posted by Patrice Poinat View Post

I would want to get :
- the median of included patient / center and the associated standard deviation

This will be only a single value in your dataset and so it cannot have an "associated standard deviation".

But here:

Code:

bysort <anonymous patient number>: keep if _n == 1 // See footnote contract siteid, freq(count) summarize count, detail // or centile count * Footnote: This assumes that the patient's ID is unique across sites * (it nearly always is in multicenter clinical studies)

I assume that you inadvertently misstated what it is that you want. Perhaps if you show your Excel formula, then others on the list can suggest a Stata equivalent.

- the median of included patient / year / center and the associated standard deviation

As Carlo mentions, standard deviation of medians is a little outré, but here goes:

Code:

bysort siteid <year> (<anonymous patient number>): keep if _n == 1 // This line might not be needed contract siteid <year>, freq(count) set type double // If you want the standard deviation of the sites' medians collapse (median) count, by(site) summarize count // If you want standard deviation of the years' medians collapse (median) count, by(year) summarize count

Again, if you've accidentally misstated what it is that you want, then fee free to clarify, including your Excel cell formulas if you feel that they will help.
1 like
Comment
Patrice Poinat

Join Date: Jan 2023

Posts: 2
#6

16 Jan 2023, 07:07

Hello,

First of all, I'm deeply sorry for not posting my questions using your standard procedures. I'll read the Statalist FAQ more carefully next time.
Secondly, I'll take some time to read and process your answers and I'll get back to you to let you know how I proceeded in the end.

Above all : thanks for your quick answers and your professionalism !

Patrice
Comment

Announcement

Describing a distribution in a clinical trial

Comment

Comment

Comment

Comment

Comment