Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • multiple outcome variables per year, need to only use one closest to DOB

    Data management question: I am working on a longitudinal and registry-based project involving children and adolescents with type 1 diabetes. Each participant has an ID#, date of birth (DOB), many different independent variables, an outcome / dependent variable (HbA1c) and a "status_date" (day-month-year) which is the date when the HbA1c value was measured. My "status_date" variable is not a string variable. My task = If there are two or more HbA1c over the period of a year (e.g., 2015), then I should only use the HbA1c value closest to the participant's DOB. Does anybody have advice on which set of Stata commands to use?

  • #2
    Assuming the status_date is available only after a participant's DOB and status_date is numeric:
    Code:
    bysort patid status_date: gen first_measurement=1 if _n==1
    first_measurement will be the indicator for the measurement.
    EDIT:

    If you need one for each year, create a year variable then add it to the bysort:
    Code:
    bysort patid year status_date : gen first_measurement=1 if _n==1

    Comment


    • #3
      Code:
      gen year = yofd(status_date)
      bysort pid year: egen close = min(abs(DOB-status_date))
      keep if abs(DOB-status_date) == close
      This code makes some assumptions about the structure of your data. If these assumptions are incorrect or the code doesn't work, please repost with a data example.

      Comment

      Working...
      X