Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • identifying consecutive months

    Hello. Please, could anyone help me?

    I have the following variables: individual, year, month, day, date. It is not a panel. The date corresponds to when the person reported a flu. Some individuals are duplicated, ie., reported the flu more than once in the same year. So, I want to code that if they reported in consecutive months, then I consider the same case of flu, with double reporting, if they report in more than one month gap, then it is a different case and the person caught the flu more than once indeed.

    Many thanks!

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str2 id byte individual float(year month day) str10 date
    "1" 1 2000 12  2 "02/12/2000"
    "1" 1 2001  1  4 "04/01/2001"
    "2" 2 2001  3 12 "12/03/2001"
    "3" 3 2000  4 11 "11/04/2000"
    "3" 3 2002  1 29 "29/01/2002"
    "4" 4 2000 12 15 "15/12/2000"
    "4" 4 2000  1  9 "09/01/2000"
    "4" 4 2000  7 17 "17/07/2000"
    "5" 5 2002  1 13 "13/01/2002"
    "5" 5 2002  9  6 "06/09/2002"
    end
    Last edited by juliana pinto; 10 Aug 2022, 08:51.

  • #2
    See (e.g.) https://www.stata-journal.com/articl...article=dm0029 for discussion of principles.

    Here a spell consists of identical or consecutive months, such that the gap between spells is 2 or more months. The gap is calculated so that the first observation is also the start of the first spell, as the gap for such is returned as missing, which is regarded as more than 2.


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str2 id byte individual float(year month day) str10 date
    "1" 1 2000 12 2 "02/12/2000"
    "1" 1 2001 1 4 "04/01/2001"
    "2" 2 2001 3 12 "12/03/2001"
    "3" 3 2000 4 11 "11/04/2000"
    "3" 3 2002 1 29 "29/01/2002"
    "4" 4 2000 12 15 "15/12/2000"
    "4" 4 2000 1 9 "09/01/2000"
    "4" 4 2000 7 17 "17/07/2000"
    "5" 5 2002 1 13 "13/01/2002"
    "5" 5 2002 9 6 "06/09/2002"
    end
    
    gen mdate = ym(year, month)
    format mdate %tm
    
    bysort id (mdate) : gen gap = mdate - mdate[_n-1]
    by id : gen spell = sum(gap >= 2)
    
    list, sepby(id)
    
    
    +-------------------------------------------------------------------------+
    | id indivi~l year month day date mdate gap spell |
    |-------------------------------------------------------------------------|
    1. | 1 1 2000 12 2 02/12/2000 2000m12 . 1 |
    2. | 1 1 2001 1 4 04/01/2001 2001m1 1 1 |
    |-------------------------------------------------------------------------|
    3. | 2 2 2001 3 12 12/03/2001 2001m3 . 1 |
    |-------------------------------------------------------------------------|
    4. | 3 3 2000 4 11 11/04/2000 2000m4 . 1 |
    5. | 3 3 2002 1 29 29/01/2002 2002m1 21 2 |
    |-------------------------------------------------------------------------|
    6. | 4 4 2000 1 9 09/01/2000 2000m1 . 1 |
    7. | 4 4 2000 7 17 17/07/2000 2000m7 6 2 |
    8. | 4 4 2000 12 15 15/12/2000 2000m12 5 3 |
    |-------------------------------------------------------------------------|
    9. | 5 5 2002 1 13 13/01/2002 2002m1 . 1 |
    10. | 5 5 2002 9 6 06/09/2002 2002m9 8 2 |
    +-------------------------------------------------------------------------+
    Last edited by Nick Cox; 10 Aug 2022, 09:19.

    Comment


    • #3
      Brilliant! Many thanks Prof. Nick Cox!

      Comment


      • #4
        Hello ! Please, can I ask another help?

        I have this other dataset with pupils' grades in a given year. I want to merge both datasets, eliminate duplicates of catching a flu in the same year . I want that the sample looks like the one below + a column indicating if the child caught a flu in that specific year when she took the test or not. If the months are consecutive then I consider just one flu in a given year. If the months are not consecutive, them I consider she got more than once flu in a year, but how to eliminate the duplicates of the other months? I am trying but every time I merge I get it wrong, i.e., the number of children with flu increase in the merged dataset. Many thanks!
        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str2 id byte individual float(year score)
        "1" 1 2000  6
        "1" 1 2001  5
        "2" 2 2001  8
        "3" 3 2000  7
        "3" 3 2002  5
        "4" 4 2000  4
        "4" 4 2002  5
        "4" 4 2004  9
        "5" 5 2002  6
        "5" 5 2003  6
        "2" 2 2005 10
        "6" 6 2000  7
        "6" 6 2003  5
        end


        Comment

        Working...
        X