Merging, tsfill, expand or collapse(max)?

juliana pinto

Join Date: Oct 2017
Posts: 74

Merging, tsfill, expand or collapse(max)?

13 Aug 2022, 21:27

Hello, Please, I am trying to merge two datasets. Could anyone help me?

I have the following variables: individual, year, month, day, date. It is not a panel. The date corresponds to when the person reported a flu. Some individuals are duplicated, ie., reported the flu more than once in the same year. So, I want to code that if they reported in consecutive months, then I consider the same case of flu, with double reporting, if they report in more than one month gap, then it is a different case and the person caught the flu more than once indeed.
Then I want to merge it to the second data.
Many thanks.

1.1. First dataset raw.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(id year month_flu day_flu flu)
1 2000 12 13 1
1 2001  1  8 1
1 2001  7 29 1
2 2002  2  5 1
3 2003  1 10 1
3 2004  7 14 1
4 2000  8 11 1
5 2001  9  3 1
6 2002 10  8 1
7 2000  5  9 1
7 2000  5 20 1
end

1.2. First dataset after doing spell by @Nick Cox.

Code:

gen mdate = ym(year, month_flu)
format mdate %tm
bysort id (mdate) : gen gap = mdate - mdate[_n-1]
by id : gen spell = sum(gap >= 2)

[/CODE]

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(id year month_flu day_flu flu mdate gap spell)
1 2000 12 13 1 491  . 1
1 2001  1  8 1 492  1 1
1 2001  7 29 1 498  6 2
2 2002  2  5 1 505  . 1
3 2003  1 10 1 516  . 1
3 2004  7 14 1 534 18 2
4 2000  8 11 1 487  . 1
5 2001  9  3 1 500  . 1
6 2002 10  8 1 513  . 1
7 2000  5  9 1 484  . 1
7 2000  5 20 1 484  0 1
end
format %tm mdate

2.Second dataset

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(id year grade gender)
 1 2000 10 1
 1 2001  9 1
 2 2001  9 1
 2 2003  5 1
 2 2004  7 1
 3 2000  5 0
 4 2000  6 0
 5 2001  8 1
 5 2003  9 1
 6 2003 10 1
 7 2004 10 0
 8 2000  2 0
 8 2001  6 0
 8 2002  5 0
 9 2002  7 1
10 2000  8 1
11 2001  4 0
11 2003  5 0
12 2004  9 1
13 2000 10 1
end

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#2

13 Aug 2022, 22:05

So, let's call the data set that resulted from calculating the spells flu_spells.dta.

Code:

use flu_spells, clear by id spell (mdate), sort: gen int spell_start = mdate[1] by id spell (mdate): gen int spell_end = mdate[_N] by id spell: keep if _n == 1 keep id mdate year spell_start spell_end merge m:1 id year using second_dataset

Added: Note--when a single spell extends across a year boundary (e.g. starts in December and continues into January of the next year) the spell will be associated with the observation for the earlier year in the second data set.

Last edited by Clyde Schechter; 13 Aug 2022, 22:13.
Comment
juliana pinto

Join Date: Oct 2017

Posts: 74
#3

13 Aug 2022, 22:42

Many thanks, prof. Clyde! It worked !
Comment

Announcement

Merging, tsfill, expand or collapse(max)?

Comment

Comment