Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • rangejoin

    Hi,

    I need help joining two datasets by a range of dates. I have a date of a procedure (svcdate) and I want to make sure patients have enrollment in insurance at least 3 months after their procedure date (svcdate). For the enrollment data, I have a start and end date (dtstart and dtend). Any suggestions on how I should do this? I tried joinby but I could not get the code to work for me.

    thank you

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str9(dtstart dtend)
    "01-JUN-19" "30-JUN-19"
    "01-SEP-19" "30-SEP-19"
    "01-JUL-20" "31-JUL-20"
    "01-OCT-20" "31-OCT-20"
    "01-MAY-19" "31-MAY-19"
    "01-SEP-20" "30-SEP-20"
    "01-AUG-20" "31-AUG-20"
    "01-APR-20" "30-APR-20"
    "01-FEB-19" "28-FEB-19"
    "01-DEC-19" "31-DEC-19"
    "01-OCT-19" "31-OCT-19"
    "01-NOV-19" "30-NOV-19"
    "01-DEC-20" "31-DEC-20"
    "01-FEB-21" "28-FEB-21"
    "01-JAN-20" "31-JAN-20"
    "01-MAR-20" "31-MAR-20"
    "01-JAN-21" "31-JAN-21"
    "01-FEB-20" "29-FEB-20"
    "01-MAR-19" "31-MAR-19"
    "01-JUL-19" "31-JUL-19"
    "01-MAY-20" "31-MAY-20"
    "01-APR-19" "30-APR-19"
    "01-AUG-19" "31-AUG-19"
    "01-JUN-20" "30-JUN-20"
    "01-NOV-20" "30-NOV-20"
    end
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str9 svcdate
    "13-SEP-20"
    "29-JUL-20"
    "29-JUL-20"
    "13-SEP-20"
    "29-JUL-20"
    "29-JUL-20"
    "13-SEP-20"
    "13-SEP-20"
    "29-JUL-20"
    "29-JUL-20"
    "29-JUL-20"
    "13-SEP-20"
    "29-JUL-20"
    "13-SEP-20"
    "29-JUL-20"
    "29-JUL-20"
    "29-JUL-20"
    "13-SEP-20"
    "29-JUL-20"
    "13-SEP-20"
    "13-SEP-20"
    "29-JUL-20"
    "29-JUL-20"
    "13-SEP-20"
    "29-JUL-20"
    end

  • #2
    This cannot be done with the data examples shown. There needs to be a patient ID variable in each file so that we can know which service date to compare to which insurance dates. In what I show below, I have created a patient ID variable just as consecutive numbers in order of the observations as given. In your real data, I imagine there is already a patient ID variable there. (If not, then you really can't do this at all.) I am also imagining that in your real data it is possible that the same patient may have multiple insurance records with dates that might or might not overlap. In order to verify continuing coverage from the service date for three months thereafter, it is necessary to combine overlapping/touching coverage periods.

    Note that dates represented as string variables cannot be used for any kind of date calculation, nor even for chronological ordering. So the first order of business in both files is to convert these to Stata internal format numeric date variables.

    I also proxy "three months" by 91 days.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int id str9 svcdate
     1 "13-SEP-20"
     2 "29-JUL-20"
     3 "29-JUL-20"
     4 "13-SEP-20"
     5 "29-JUL-20"
     6 "29-JUL-20"
     7 "13-SEP-20"
     8 "13-SEP-20"
     9 "29-JUL-20"
    10 "29-JUL-20"
    11 "29-JUL-20"
    12 "13-SEP-20"
    13 "29-JUL-20"
    14 "13-SEP-20"
    15 "29-JUL-20"
    16 "29-JUL-20"
    17 "29-JUL-20"
    18 "13-SEP-20"
    19 "29-JUL-20"
    20 "13-SEP-20"
    21 "13-SEP-20"
    22 "29-JUL-20"
    23 "29-JUL-20"
    24 "13-SEP-20"
    25 "29-JUL-20"
    end
    tempfile service_dates
    save `service_dates'
    
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int id str9(dtstart dtend)
     1 "01-JUN-19" "30-JUN-19"
     2 "01-SEP-19" "30-SEP-19"
     3 "01-JUL-20" "31-JUL-20"
     4 "01-OCT-20" "31-OCT-20"
     5 "01-MAY-19" "31-MAY-19"
     6 "01-SEP-20" "30-SEP-20"
     7 "01-AUG-20" "31-AUG-20"
     8 "01-APR-20" "30-APR-20"
     9 "01-FEB-19" "28-FEB-19"
    10 "01-DEC-19" "31-DEC-19"
    11 "01-OCT-19" "31-OCT-19"
    12 "01-NOV-19" "30-NOV-19"
    13 "01-DEC-20" "31-DEC-20"
    14 "01-FEB-21" "28-FEB-21"
    15 "01-JAN-20" "31-JAN-20"
    16 "01-MAR-20" "31-MAR-20"
    17 "01-JAN-21" "31-JAN-21"
    18 "01-FEB-20" "29-FEB-20"
    19 "01-MAR-19" "31-MAR-19"
    20 "01-JUL-19" "31-JUL-19"
    21 "01-MAY-20" "31-MAY-20"
    22 "01-APR-19" "30-APR-19"
    23 "01-AUG-19" "31-AUG-19"
    24 "01-JUN-20" "30-JUN-20"
    25 "01-NOV-20" "30-NOV-20"
    end
    tempfile insurance_dates
    save `insurance_dates'
    
    use `service_dates', clear
    gen _svcdate = daily(svcdate, "DM20Y")
    assert missing(_svcdate) == missing(svcdate)
    format _svcdate %td
    drop svcdate
    rename _svcdate svcdate
    save `service_dates', replace
    
    use `insurance_dates', clear
    foreach v of varlist dt* {
        gen _`v' = daily(`v', "DM20Y")
        assert missing(`v') == missing(_`v')
        format _`v' %td
        drop `v'
        rename _`v' `v'
    }
    assert dtstart <= dtend & !missing(dtstart, dtend)
    //    NOW COMBINE INSURANCE PERIODS THAT OVERLAP OR ARE ADJACENT
    gen `c(obs_t)' obs_no = _n
    reshape long dt, i(obs_no) j(event) string
    by id (dt), sort: gen depth = sum((event == "start") - (event == "end"))
    by id (dt): gen coverage_interval = sum(depth[_n-1] == 0 & dt > dt[_n-1] + 1)
    collapse (min) dtstart = dt (max) dtend = dt, by(id coverage_interval)
    
    rangejoin svcdate dtstart dtend using `service_dates', by(id)
    local 3_months 91
    by id, sort: egen byte three_month_coverage = max(dtend >= svcdate + `3_months')
    At the end of this code, the data set will contain insurance coverage dates (with overlapping or adjacent coverage intervals combined). The variable svcdate will be missing if there is no coverage at all on the service date, otherwise it will contain the service date. And the variable three_month_coverage will be 1 if the coverage extends at least 91 days beyond the service date, 0 otherwise. For those patients where there is three month coverage, only the (combined) interval containing the service date and extending at least 91 days beyond that will be present. For all other patients, all of the insurance coverage dates are shown. Obviously you can modify this with -keep- and -drop- commands to suit your preferences.

    Comment


    • #3
      Hi Clyde,

      WOW. this is amazing. Thank you. You are correct I have a pt id and I should have included that in the code.

      I ran the code and it works great. I just do not understand the ouptut of the variable coverage_interval if you don't mind taking a look at it. Thanks again

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float(id coverage_interval dtstart dtend svcdate) byte three_month_coverage
      30145 1 21640 21822 21796 0
      30145 1 21640 21822 21796 0
      30145 1 21640 21822 21801 0
      30145 2 21854 22158     . 0
      30145 1 21640 21914     . 0
      30145 1 22371 22553     . 0
      30145 2 22615 22645     . 0
      30145 1 22006 22645 22042 1
      30145 1 22006 22645 22042 1
      30145 1 22006 22645 22042 1
      30145 1 22006 22645 22042 1
      30145 1 22006 22645 22042 1
      30145 1 22006 22645 22042 1
      30145 1 21975 22645 22628 0
      30145 1 21975 22645 22628 0
      30145 1 21975 22645 22628 0
      30145 1 21975 22645 22630 0
      30145 1 21975 22645 22630 0
      30145 1 21975 22645 22630 0
      30145 1 22371 22400     . 0
      30145 1 21609 22645 21917 1
      30145 1 21609 22280 21714 1
      30145 1 22159 22645 22579 0
      30145 1 22159 22645 22579 0
      30145 1 22281 22645     . 1
      30145 1 21975 22584     . 0
      30145 1 22281 22645     . 1
      30145 1 22281 22645 22490 1
      30145 1 22281 22645 22490 1
      30145 1 22281 22645 22449 1
      30145 1 22281 22645 22449 1
      30145 1 22281 22645 22633 0
      30145 1 21946 22219 21965 1
      30145 1 21946 22219 21965 1
      30145 1 22401 22645     . 1
      30145 1 22006 22035     . 1
      30145 1 21762 21853     . 0
      30145 1 21915 22645     . 1
      30145 1 22189 22645     . 0
      30145 1 22159 22645     . 0
      30145 1 22401 22645 22578 0
      30145 1 22401 22645 22593 0
      30145 1 21975 22645     . 0
      30145 1 21975 22249     . 1
      30145 1 21946 22645     . 1
      30145 1 22159 22188     . 1
      30145 1 22281 22645 22386 1
      30145 1 22281 22645 22386 1
      30145 1 22128 22584     . 1
      30145 1 21975 22005     . 1
      end
      format %td dtstart
      format %td dtend
      format %td svcdate

      Comment


      • #4
        Oh, sorry, you don't need that variable. It was needed to reach the final result, but I meant to drop it at the end and forgot.

        But, since you are asking, there can be people who are intermittently insured. They might be covered from, say, January 1 through September 30 of a year and then again from March 1 through July 31 of the following year. Those are two separate coverage intervals--there is a gap between them where there is no coverage in effect. So the variable coverage_interval is just a variable that counts up, starting from 0, the number of coverage intervals. So, for the person I just described, this variable would be 0 for January 1 through September 30 of the first year and 1 for March 1 through July 31 of the following year. If there were yet another bout of insurance coverage for this person, say October 1 through February 28 of yet the next year on, that one would have coverage_interval = 2.

        Comment


        • #5
          Thank you again !

          Comment

          Working...
          X