Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Identify duplicate codes within range of months

    I have some panel data where individuals could receive multiple types of service in a given visit; thus my data is at the individual-service type level.

    What I want to do, at the month level, is identify when someone returned to receive the exact same service within a couple months of the initial service.

    So, if someone received Service A10 in January 2024, and then came back again in either February or March to receive Service A10 again, I'd want to just create a simple variable that flags the February/March "repeat" row as =1. What is the simplest way to accomplish this? Example data below. Thanks.


    Code:
    clear
    input str2 id str10 date str5 code str7 yr_month
    "01" "05 Jan 23" "100A" "2023m1"
    "01" "20 Jan 23" "100A" "2023m1"
    "01" "15 Feb 23" "405A" "2023m2"
    "01" "15 Mar 23" "100A" "2023m3"
    "01" "10 Apr 23" "25D5" "2023m4"
    "01" "25 May 23" "100A" "2023m5"
    "01" "15 May 23" "300C" "2023m5"
    "01" "20 Jun 23" "499" "2023m6"
    "01" "01 Jul 23" "205Z.4" "2023m7"
    "02" "01 Mar 23" "200B" "2023m3"
    "02" "05 Apr 23" "25D6" "2023m4"
    "02" "10 May 23" "400D" "2023m5"
    "02" "10 May 23" "500E" "2023m5"
    "02" "01 Jun 23" "20.OB" "2023m6"
    "03" "01 Jan 24" "300C" "2024m1"
    "03" "01 Feb 24" "300C" "2024m2"
    "03" "01 Apr 24" "25D8" "2024m4"
    end

  • #2
    Is this what you need?

    Code:
    gen mon = monthly(yr_month, "YM")
    format mon %tm
    bysort id code (yr_month): gen repeat = (mon - mon[_n-1] < 3)
    drop mon
    which produces:
    Code:
    . list, sepby(id) noobs
    
      +--------------------------------------------+
      | id        date    code   yr_month   repeat |
      |--------------------------------------------|
      | 01   20 Jan 23    100A     2023m1        0 |
      | 01   05 Jan 23    100A     2023m1        1 |
      | 01   15 Mar 23    100A     2023m3        1 |
      | 01   25 May 23    100A     2023m5        1 |
      | 01   01 Jul 23   205Z.     2023m7        0 |
      | 01   10 Apr 23    25D5     2023m4        0 |
      | 01   15 May 23    300C     2023m5        0 |
      | 01   15 Feb 23    405A     2023m2        0 |
      | 01   20 Jun 23     499     2023m6        0 |
      |--------------------------------------------|
      | 02   01 Jun 23   20.OB     2023m6        0 |
      | 02   01 Mar 23    200B     2023m3        0 |
      | 02   05 Apr 23    25D6     2023m4        0 |
      | 02   10 May 23    400D     2023m5        0 |
      | 02   10 May 23    500E     2023m5        0 |
      |--------------------------------------------|
      | 03   01 Apr 24    25D8     2024m4        0 |
      | 03   01 Jan 24    300C     2024m1        0 |
      | 03   01 Feb 24    300C     2024m2        1 |
      +--------------------------------------------+
    If you want to see repeats relative only to the first service of that type, you can change the line that creates the variable to
    Code:
    bysort id code (yr_month): gen repeat = cond(_n > 1, mon - mon[1] < 3, 0)
    Last edited by Hemanshu Kumar; 23 Jul 2025, 13:13.

    Comment


    • #3
      Thanks. Your code with the second option

      Code:
      bysort id code (yr_month): gen repeat = cond(_n > 1, mon - mon[1] < 3, 0)
      is close but not quite exactly what I wanted, because I only want to flag repeats at the monthly level. For instance, the second row should not be flagged as a repeat because it occurs within the same month as the first instance--this is the aspect I was especially struggling with.

      Comment


      • #4
        So would this do it?

        Code:
        bysort id code (yr_month): gen repeat = cond(_n > 1, inrange(mon - mon[1], 1, 2), 0)

        Comment

        Working...
        X