Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Describing a distribution in a clinical trial

    Hello,

    I'm a new user to Stata and not very familiar with this environment. I tried to look up in the help guide but was unsuccessful so far ... Hence this very basic question here :

    In a medical clinical trial, I have 150 patients, meaning 150 observations
    For each patient, I have a variable that identifies them with an anonymous patient number,
    They have been included in this trial from may 2016 until june 2019, I have a variable that gives the date of signature of the agreement (date format)
    They have been included thanks to 22 centers different around the globe. The variable identifying the center is named "siteid"

    I easily got the distribution of the number of patients included / center via this command :
    hist siteid, frequency witdh(1)


    Now, I would want to get :
    - the median of included patient / center and the associated standard deviation
    - the median of included patient / year / center and the associated standard deviation

    It's quite easily done on Excel, but I'm pretty sure there must be a very easy way to do it on Stata too ! I know it's a very basic question and I apologize for it ; but I'm struggling ... and i really would like to learn it on Stata

    Thank you for your help !

    Patrice

  • #2
    Try something along these lines. But lots of ways to do it.

    by siteid: summ x1, d

    tabstat x1 , by(siteid) stats(mean p50 sd N)





    Comment


    • #3
      Welcome to Statalist, Patrice! Please see the Statalist FAQ for suggestions on how to post questions most effectively, especially #12. In the future, please post a short extract of your data using -dataex- to help others help you. And post the exact command you tried, within CODE blocks (the # button on the edit toolbar).

      As to your question: you might want to look up
      Code:
      help tabstat
      If you are on Stata 17 (the latest version as of now), you may also want to check out the table command.

      Comment


      • #4
        Patrice:
        as an aside to previous helpful replies, I find weird that you want the "standard deviation of the median" instead of the interquartile range.
        Exploiting George's assist, I'd propose something along the following lines:
        Code:
        . sysuse auto.dta
        (1978 automobile data)
        
        
        . tabstat price, stat(N mean sd p25 p50 p75 min max) by(foreign)
        
        Summary for variables: price
        Group variable: foreign (Car origin)
        
         foreign |         N      Mean        SD       p25       p50       p75       Min       Max
        ---------+--------------------------------------------------------------------------------
        Domestic |        52  6072.423  3097.104      4184    4782.5      6234      3291     15906
         Foreign |        22  6384.682  2621.915      4499      5759      7140      3748     12990
        ---------+--------------------------------------------------------------------------------
           Total |        74  6165.257  2949.496      4195    5006.5      6342      3291     15906
        ------------------------------------------------------------------------------------------
        
        .
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Originally posted by Patrice Poinat View Post
          I would want to get :
          - the median of included patient / center and the associated standard deviation
          This will be only a single value in your dataset and so it cannot have an "associated standard deviation".

          But here:
          Code:
          bysort <anonymous patient number>: keep if _n == 1 // See footnote
          contract siteid, freq(count)
          summarize count, detail
          
          // or
          
          centile count
          
          * Footnote: This assumes that the patient's ID is unique across sites
          * (it nearly always is in multicenter clinical studies)
          I assume that you inadvertently misstated what it is that you want. Perhaps if you show your Excel formula, then others on the list can suggest a Stata equivalent.

          - the median of included patient / year / center and the associated standard deviation
          As Carlo mentions, standard deviation of medians is a little outré, but here goes:
          Code:
          bysort siteid <year> (<anonymous patient number>): keep if _n == 1 // This line might not be needed
          contract siteid <year>, freq(count)
          set type double
          
          // If you want the standard deviation of the sites' medians
          collapse (median) count, by(site)
          summarize count
          
          // If you want standard deviation of the years' medians
          collapse (median) count, by(year)
          summarize count
          Again, if you've accidentally misstated what it is that you want, then fee free to clarify, including your Excel cell formulas if you feel that they will help.

          Comment


          • #6
            Hello,

            First of all, I'm deeply sorry for not posting my questions using your standard procedures. I'll read the Statalist FAQ more carefully next time.
            Secondly, I'll take some time to read and process your answers and I'll get back to you to let you know how I proceeded in the end.

            Above all : thanks for your quick answers and your professionalism !

            Patrice

            Comment

            Working...
            X