Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging, tsfill, expand or collapse(max)?

    Hello, Please, I am trying to merge two datasets. Could anyone help me?

    I have the following variables: individual, year, month, day, date. It is not a panel. The date corresponds to when the person reported a flu. Some individuals are duplicated, ie., reported the flu more than once in the same year. So, I want to code that if they reported in consecutive months, then I consider the same case of flu, with double reporting, if they report in more than one month gap, then it is a different case and the person caught the flu more than once indeed.
    Then I want to merge it to the second data.
    Many thanks.


    1.1. First dataset raw.
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(id year month_flu day_flu flu)
    1 2000 12 13 1
    1 2001  1  8 1
    1 2001  7 29 1
    2 2002  2  5 1
    3 2003  1 10 1
    3 2004  7 14 1
    4 2000  8 11 1
    5 2001  9  3 1
    6 2002 10  8 1
    7 2000  5  9 1
    7 2000  5 20 1
    end

    1.2. First dataset after doing spell by @Nick Cox.
    Code:
    gen mdate = ym(year, month_flu)
    format mdate %tm
    bysort id (mdate) : gen gap = mdate - mdate[_n-1]
    by id : gen spell = sum(gap >= 2)
    [/CODE]

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(id year month_flu day_flu flu mdate gap spell)
    1 2000 12 13 1 491  . 1
    1 2001  1  8 1 492  1 1
    1 2001  7 29 1 498  6 2
    2 2002  2  5 1 505  . 1
    3 2003  1 10 1 516  . 1
    3 2004  7 14 1 534 18 2
    4 2000  8 11 1 487  . 1
    5 2001  9  3 1 500  . 1
    6 2002 10  8 1 513  . 1
    7 2000  5  9 1 484  . 1
    7 2000  5 20 1 484  0 1
    end
    format %tm mdate

    2.Second dataset
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(id year grade gender)
     1 2000 10 1
     1 2001  9 1
     2 2001  9 1
     2 2003  5 1
     2 2004  7 1
     3 2000  5 0
     4 2000  6 0
     5 2001  8 1
     5 2003  9 1
     6 2003 10 1
     7 2004 10 0
     8 2000  2 0
     8 2001  6 0
     8 2002  5 0
     9 2002  7 1
    10 2000  8 1
    11 2001  4 0
    11 2003  5 0
    12 2004  9 1
    13 2000 10 1
    end


  • #2
    So, let's call the data set that resulted from calculating the spells flu_spells.dta.

    Code:
    use flu_spells, clear
    by id spell (mdate), sort: gen int spell_start = mdate[1]
    by id spell (mdate): gen int spell_end = mdate[_N]
    by id spell: keep if _n == 1
    keep id mdate year spell_start spell_end
    
    merge m:1 id year using second_dataset
    Added: Note--when a single spell extends across a year boundary (e.g. starts in December and continues into January of the next year) the spell will be associated with the observation for the earlier year in the second data set.
    Last edited by Clyde Schechter; 13 Aug 2022, 22:13.

    Comment


    • #3
      Many thanks, prof. Clyde! It worked !

      Comment

      Working...
      X