identifying consecutive months

juliana pinto

Join Date: Oct 2017

Posts: 74
#1

identifying consecutive months

10 Aug 2022, 08:41

Hello. Please, could anyone help me?

I have the following variables: individual, year, month, day, date. It is not a panel. The date corresponds to when the person reported a flu. Some individuals are duplicated, ie., reported the flu more than once in the same year. So, I want to code that if they reported in consecutive months, then I consider the same case of flu, with double reporting, if they report in more than one month gap, then it is a different case and the person caught the flu more than once indeed.

Many thanks!

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str2 id byte individual float(year month day) str10 date "1" 1 2000 12 2 "02/12/2000" "1" 1 2001 1 4 "04/01/2001" "2" 2 2001 3 12 "12/03/2001" "3" 3 2000 4 11 "11/04/2000" "3" 3 2002 1 29 "29/01/2002" "4" 4 2000 12 15 "15/12/2000" "4" 4 2000 1 9 "09/01/2000" "4" 4 2000 7 17 "17/07/2000" "5" 5 2002 1 13 "13/01/2002" "5" 5 2002 9 6 "06/09/2002" end

Last edited by juliana pinto; 10 Aug 2022, 08:51.
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 36054

10 Aug 2022, 09:08

See (e.g.) https://www.stata-journal.com/articl...article=dm0029 for discussion of principles.

Here a spell consists of identical or consecutive months, such that the gap between spells is 2 or more months. The gap is calculated so that the first observation is also the start of the first spell, as the gap for such is returned as missing, which is regarded as more than 2.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str2 id byte individual float(year month day) str10 date
"1" 1 2000 12 2 "02/12/2000"
"1" 1 2001 1 4 "04/01/2001"
"2" 2 2001 3 12 "12/03/2001"
"3" 3 2000 4 11 "11/04/2000"
"3" 3 2002 1 29 "29/01/2002"
"4" 4 2000 12 15 "15/12/2000"
"4" 4 2000 1 9 "09/01/2000"
"4" 4 2000 7 17 "17/07/2000"
"5" 5 2002 1 13 "13/01/2002"
"5" 5 2002 9 6 "06/09/2002"
end

gen mdate = ym(year, month)
format mdate %tm

bysort id (mdate) : gen gap = mdate - mdate[_n-1]
by id : gen spell = sum(gap >= 2)

list, sepby(id)


+-------------------------------------------------------------------------+
| id indivi~l year month day date mdate gap spell |
|-------------------------------------------------------------------------|
1. | 1 1 2000 12 2 02/12/2000 2000m12 . 1 |
2. | 1 1 2001 1 4 04/01/2001 2001m1 1 1 |
|-------------------------------------------------------------------------|
3. | 2 2 2001 3 12 12/03/2001 2001m3 . 1 |
|-------------------------------------------------------------------------|
4. | 3 3 2000 4 11 11/04/2000 2000m4 . 1 |
5. | 3 3 2002 1 29 29/01/2002 2002m1 21 2 |
|-------------------------------------------------------------------------|
6. | 4 4 2000 1 9 09/01/2000 2000m1 . 1 |
7. | 4 4 2000 7 17 17/07/2000 2000m7 6 2 |
8. | 4 4 2000 12 15 15/12/2000 2000m12 5 3 |
|-------------------------------------------------------------------------|
9. | 5 5 2002 1 13 13/01/2002 2002m1 . 1 |
10. | 5 5 2002 9 6 06/09/2002 2002m9 8 2 |
+-------------------------------------------------------------------------+

Last edited by Nick Cox; 10 Aug 2022, 09:19.

Comment

juliana pinto

Join Date: Oct 2017

Posts: 74
#3

10 Aug 2022, 09:16

Brilliant! Many thanks Prof. Nick Cox!
Comment
juliana pinto

Join Date: Oct 2017

Posts: 74
#4

12 Aug 2022, 21:09

Hello ! Please, can I ask another help?

I have this other dataset with pupils' grades in a given year. I want to merge both datasets, eliminate duplicates of catching a flu in the same year . I want that the sample looks like the one below + a column indicating if the child caught a flu in that specific year when she took the test or not. If the months are consecutive then I consider just one flu in a given year. If the months are not consecutive, them I consider she got more than once flu in a year, but how to eliminate the duplicates of the other months? I am trying but every time I merge I get it wrong, i.e., the number of children with flu increase in the merged dataset. Many thanks!

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str2 id byte individual float(year score) "1" 1 2000 6 "1" 1 2001 5 "2" 2 2001 8 "3" 3 2000 7 "3" 3 2002 5 "4" 4 2000 4 "4" 4 2002 5 "4" 4 2004 9 "5" 5 2002 6 "5" 5 2003 6 "2" 2 2005 10 "6" 6 2000 7 "6" 6 2003 5 end
Comment

Announcement

identifying consecutive months

Comment

Comment

Comment