Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate a variable that combines string-observations based on two conditions (date constraint by group)

    Hi everyone,

    I struggle with a list of persons that can be uniquely identified by the variable "name". The persons are observed at different points in time ("date_str" / "date"). I want to generate a new variable that includes all values of the variable "classes" for a given person within the past 5 years. Exemplary data: see below.

    Code:
    clear
    input str25 name str15 date_str str15 classes
    "Lastname 1, First name 1" "June 16, 2003" "F22B H04Q F04C"
    "Lastname 1, First name 1" "July 15, 2004" "B65D G01N"
    "Lastname 1, First name 1" "May 3, 2006" "C12Q"
    "Lastname 1, First name 1" "July 8, 2009" "C08K"
    "Lastname 2, First name 2" "April 5, 1999" "F16J B06B H04R"
    "Lastname 2, First name 2" "May 20, 2003" "F22B"
    "Lastname 2, First name 2" "April 2, 2007" "G01N"
    end
    gen date = date(date_str, "MDY")
    order name date_str date classes

    For instance, in line 4 the new variable would have the value "C08K C12Q B65D G01N". In more general terms: I am trying to combine string-observations based on two conditions: (1) same "name" and (2) "date" has to be within the past 5 years of the focal date.

    This might be related to a previous post: www.statalist.org/forums/forum/general-stata-discussion/general/1295115-how-to-summarize-multiple-observations-per-id. However, I struggle with adapting this: It is not enough to look at the previous line. The evaluation of dates has to consider the whole group defined by "name". [Repeating classes - e.g., "F22B H04Q F22B" - are not an issue: I can discard them afterwards.]

    I'm thankful for any help or suggestions!
    Patrick

  • #2
    The author reposted the question in the General forum.

    Comment

    Working...
    X