Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to create different samples based on the following conditions

    Hi all,

    Considering this data example where id indicates mother id; bord is birth order of her children; dob, mob, and you are day, month and year of birth of children respectively.
    Code:
    clear
    input byte(id bord) float(dob mob yob)
     1 1 18  6 2013
     1 2  1  4 2004
     2 1  1  4 2007
     3 1  1  5 1996
     4 1  1  5 2007
     4 2  1  7 1999
     5 1 18  8 2010
     5 2  1  4 2005
     6 1 18  8 2012
     6 2  1 10 2001
     6 3  1  7 1996
     7 1  1  8 2001
     7 2  1  6 2000
     8 2  1  6 2000
     8 1  1 11 2006
     9 1  1 10 2004
     9 2  1 12 1993
    10 1  1  6 2006
    10 2  1  4 2000
    11 1 19  8 2013
    11 2 20  7 2010
    11 3 10  9 2002
    end
    I want to create a variable containing three different samples as follows:
    - Sample 1 contains ONLY mothers who have children born before Oct 11, 2006
    - Sample 2 contains mothers who have children either born before or after Oct 11, 2006. Please note that this sample does not include individuals in sample 1
    - Sample 3 contains ONLY mothers who have children born on or after Oct 11, 2006. Please note that this sample does not include individuals in sample 2

    Thank you.

  • #2
    Dear Matthew,
    here is my solution to your problem. It is probably not efficient but does the work you need.
    Alternatively, you could use the command todat (ssc install todate) to obtain the same bday_date_format variable.

    Code:
    * First create a variable for the birthday in date format
    tostring mob dob yob, replace
    replace mob = "0" + mob if length(mob) ==1
    replace dob = "0" + dob if length(dob) ==1
    
    egen bday = concat(mob dob yob)
    
    gen bday_date_format = date(bday, "MDY")
    format bday_date_format %td
    
    * Second generate younger and older birthday for each mother
    
    bysort id: egen younger_bday = min(bday_date_format)
    format younger_bday %td
    
    bysort id: egen older_bday = max(bday_date_format)
    format older_bday %td
    
    * Third generate the indicators for the sample
    
    gen sample = .
    bysort id: replace sample = 1 if older_bday < mdy(10,11,2006)
    bysort id: replace sample = 2 if older_bday > mdy(10,11,2006) & younger_bday<mdy(10,11,2006)
    bysort id: replace sample = 3 if younger_bday > mdy(10,11,2006)

    Comment

    Working...
    X