No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating the mean


    Suppose I have data as follows:

    . list , clean
    id s_htdate s_testdate cvd
    1. 1 3/23/1992 12/30/2003 0
    2. 1 3/23/1992 5/12/2004 0
    3. 1 3/23/1992 5/13/2004 0
    4. 1 3/23/1992 7/19/2004 1

    How would I go about calculating the average time that it took before individuals developed CVD given that I am interested in the time between the htdate and the first occurence of cvd? I am interested only in those who developed CVD and its based on multiple record data.

  • #2
    This is answerable by studying various Stata FAQs. How can I identify first and last occurrences systematically in panel data? How can I generate a variable containing the last of several dates?
    (This also covers the first date too.)

    (The Statalist FAQ does advise looking through the Stata FAQs before posting.)

    But there are several ways of doing it. Here's another:

    It sounds as if you want

    bysort id : egen diagnosis_date = min(s_testdate / cvd) 
    gen develop_time = diagnosis_date - s_htdate
    To average across patients,

    egen tag = tag(id)
    su develop_time if tag, detail