Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How do I tag laboratory value that precede (decide) a prescription?

    Dear Statalist,
    I have a dataset extracted from real-life patient journals over a 10 year period. Each row has a time-stamp (YMDHHMMSS), patient identifier and either a column with laboratory value or prescription. Prescriptions and laboratory values are in separate collumns and each value are on a separate row. In all there are apr 100.000 rows of which 25.000 are laboratory and 75.000 are prescriptions. I want to study what laboratory value has preceded change in medication. For this I need to tag prescriptions that are closest in time after a laboratory value. Can someone please help me?

  • #2
    Without example data, I'm not going to try to write code--this is complicated enough that it requires testing the code to make sure the details are correct.

    But here's the overall approach:

    1. Create two separate data sets. The first will contain only the observations containing lab values, and the second will contain only the observations containing prescriptions. Make sure the prescriptions data set does not contain the lab values variable.

    2. Use -rangejoin- (-ssc install rangejoin to get this command if you don't have it; you may also need to -ssc install rangestat-, as -rangejoin- requires the latter) to link those two data sets to each other. Specify the keyvar, low, and high so that only prescriptions that occur after the lab value are matched.

    3. Sort the data on patient identifier and lab timestamp, and then on prescription stamp within that. Use -by patient_identifier lab_timestamp (prescription_timestamp) to keep only first prescription.

    I should point out that there is a certain ambiguity in the way your problem is posed. I think it is likely that some patients will have prescriptions with identical time stamps. You don't provide a way to pick out which of those would be considered "closest."

    If you want assistance with the details of coding beyond this outline, please use the -dataex- command to show example data. (If you are not familiar with -dataex-, please read Forum FAQ #12 for information.)
    Last edited by Clyde Schechter; 17 Jan 2018, 12:56. Reason: Correct #2 to be consonant with actual -rangejoin- syntax.

    Comment


    • #3
      Thanks Clyde,
      Thank you for your advice. I understand that I need to be more specific regarding my data. I have installed rangejoin and created the datasets you suggested.

      The screen shots below show what the data looks like.

      Memory:
      Click image for larger version

Name:	Skärmavbild 2018-01-19 kl. 22.42.53.png
Views:	1
Size:	53.4 KB
ID:	1426588

      Using:
      Click image for larger version

Name:	Skärmavbild 2018-01-19 kl. 22.41.16.png
Views:	1
Size:	70.1 KB
ID:	1426587

      Now, I want to rangejoin all prescriptions 30 days after lab_var. I tried:

      rangejoin time_stamp 030 using using_dataset, by(id_var)

      The result was:
      Click image for larger version

Name:	Skärmavbild 2018-01-19 kl. 22.55.17.png
Views:	1
Size:	82.8 KB
ID:	1426589

      Obviously I have done something wrong. Is it the "low" and "high" specification that should be different if I want a 30 day range and the format is %tc?

      Comment


      • #4
        Yes, the interval specification was wrong. Stata clock variables (which is what your timestamp variables are) are counts of milliseconds. So your command seeks to join every prescription written within 30 milliseconds after the lab value.

        So there are two ways to proceed, and they might produce slightly different results.

        1. We can calculate the number of milliseconds in 30 days (= 30*24*60*60*1000) and use that in the interval specification.
        2. We can extract the date (without the time part) from the time_stamp variables into Stata date (not clock) variables, and then apply -rangejoin- with an interval of 30. Stata date variables are counts of days.

        The difference will be that if there is a lab on January 1, 2017 at 2:45:00 PM and a prescription on January 31, 2017 at 5:00 PM, method 1 will not consider it to be within 30 days (because it came too late in the day), but method 2 will.

        I'm going to guess you would prefer method 2. So what you have to do is
        Code:
        // DO THIS IN BOTH DATA SETS
        gen date_stamp = dofc(time_stamp)
        format date_stamp %td
        Then your -rangejoin- command can look like
        Code:
        rangejoin date_stamp 0 30 using using_dataset, by(id_var)
        (which is the same as yours except the key variable is date_stamp, and there is a space between 0 and 30.)

        If you would prefer method 1, it's just a matter of changing the -rangejoin- command instead:
        Code:
        local 30days = 30*24*60*60*1000
        rangejoin time_stamp 0 `30days' using using_dataset, by(id_var)
        In the future, please do not use screenshots to show data. They are often completely unreadable. These, fortunately, can be read--but that's just good luck. Even when readable, were it necessary to actually work with the data to test out the code, there is no way to import data from a screenshot. The helpful way to show example data is with the -dataex- command. Read FAQ #12 for details about that and also about the best ways to show Stata code and Stata output. The more useful you make your posts to those who want to help you, the better your chances of getting a helpful and timely response.


        Comment


        • #5
          This worked perfect!
          Thanks :-)

          Comment

          Working...
          X