Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Correlation coefficient Using stata

    Dear Stata user, I want to do following things:
    1) To estimate mean of study time if died==1 (1 if patient died) over the same age group, similarly if died==0. Please note that i want to restrict my analysis between age group 50 to 65, where i have 16 data point.
    2) In the second stage, i want to find out correlation of the generated mean (16 data point, i.e mean value) between if died==0 and died==1

    For data set you can download using following code:
    sysuse cancer.dta

    How to write the loop code to do this exercise? Please suggest me.

  • #2
    Biswa:
    1st question:
    Code:
    . use "http://www.stata-press.com/data/r15/drugtr.dta"
    (Patient Survival in Drug Trial)
    
    
    . bysort died: tabstat age if age>=50 & age<=65, stat(mean sd p50 min max)
    
    -----------------------------------------------------------------------------------------------------------------------
    -> died = 0
    
        variable |      mean        sd       p50       min       max
    -------------+--------------------------------------------------
             age |  55.92308    5.0243        56        50        65
    ----------------------------------------------------------------
    
    -----------------------------------------------------------------------------------------------------------------------
    -> died = 1
    
        variable |      mean        sd       p50       min       max
    -------------+--------------------------------------------------
             age |  56.91667  3.988207        57        50        65
    ----------------------------------------------------------------
    2nd question: I'm not clear with what you're after. Correlation is between variables, not sample means.
    Maybe you want consider -regress- (using the same dataset as above):
    Code:
    . regress age i.died if age>=50 & age<=65
    
          Source |       SS           df       MS      Number of obs   =        37
    -------------+----------------------------------   F(1, 35)        =      0.44
           Model |  8.32467082         1  8.32467082   Prob > F        =    0.5135
        Residual |   668.75641        35   19.107326   R-squared       =    0.0123
    -------------+----------------------------------   Adj R-squared   =   -0.0159
           Total |  677.081081        36  18.8078078   Root MSE        =    4.3712
    
    ------------------------------------------------------------------------------
             age |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          1.died |   .9935897   1.505302     0.66   0.514    -2.062335    4.049514
           _cons |   55.92308   1.212351    46.13   0.000     53.46187    58.38428
    ------------------------------------------------------------------------------
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Dear Sir, Let me to clarify my doubts:

      1) I want to estimate mean of studytime for age group 50, 51,52........65 when died==0 (please note that in above data set age 52 or 54 may be unavailable, but please assume all age are available for generalization purpose)
      2) To estimate mean of studytime for age group 50,51,52.....65 when died==1
      3) To estimate correlation between these two variables generated (mean variable of step 1 and step 2) from step 1 and 2.

      I think my written code will give you better clarification, where i have estimated step 1, then step 2, then step 3.
      step 1::::
      sysuse cancer.dta
      gen id=_n
      sort died
      by age, sort : egen float avg0 = mean(studytime) if died==0
      sort died age
      duplicates drop avg0 if died==0, force
      keep if died==0
      save cohort0, replace

      Step 2::::

      by age, sort : egen float avg1 = mean(studytime) if died==1
      sort died age
      duplicates drop avg1 if died==1, force
      keep if died==1
      save cohort1, replace
      clear

      use cohort0, replace
      merge 1:1 id using cohort1
      drop _m
      corr avg0 avg1

      However, above codes are too lengthy, can you suggest me loop code, which can be completed within few lines.

      Comment


      • #4
        Biswa:
        admittedly, I fail to get what you've done.
        Maybe what follows will give you some hints:
        Code:
        . use "http://www.stata-press.com/data/r15/drugtr.dta"
        (Patient Survival in Drug Trial)
        
        . bysort age: egen mean_0=mean(studytime) if died==0
        (31 missing values generated)
        
        . bysort age: egen mean_1=mean(studytime) if died==1
        (17 missing values generated)
        
        . save "C:\Users\user\Desktop\pre_coll.dta", replace
        file C:\Users\user\Desktop\pre_coll.dta saved
        
        . use "C:\Users\user\Desktop\pre_coll.dta"
        (Patient Survival in Drug Trial)
        
        . collapse (mean) mean_0 mean_1, by(age)
        
        . ktau mean_0 mean_1 , stats(taua taub p)
        
          Number of obs =      10
        Kendall's tau-a =       0.0889
        Kendall's tau-b =       0.0899
        Kendall's score =       4
            SE of score =      11.136   (corrected for ties)
        
        Test of Ho: mean_0 and mean_1 are independent
             Prob > |z| =       0.7876  (continuity corrected)
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Thank You Sir

          Comment


          • #6
            Biswa:
            Carlo is enough. Thanks.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment

            Working...
            X