Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Differences between dates using panel data

    Hi there,

    I'm working with some panel data, whereby for each subject I have multiple events (lab tests) recorded in long form. Each event is dated and I need to calculate whether two specific tests (test_id 12 and 147) are conducted within 3 months of each other (to assess whether testing/diagnostic guidelines are being followed). Subjects may have multiple instances of each test.

    At this moment in time, I'm considering looping through all instances of these tests to ascertain whether the other test occurs within 3 months. Given some subjects have 100s of test records and the dataset contains 100,000s of subjects, I'm wondering whether there is a more efficient method to derive my requirements.
    (I found plenty of existing threads on the topic of dates and panel data, but nothing akin to the above)

    Thanks in advance,

    Rob.

  • #2
    Yes, there is a much better way to do this. But it cannot be described simply in words. Please post back with example data (use the -dataex- command) containing the relevant variables. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Also please provide a clear explanation of how you want the results to look. Do you want a variable that marks each event according to whether or not another event occurs within 3 months? Or do you want each observation to have a new variable listing the actual events and their dates that are within 3 months? Or some other way of showing it? Be clear and specific.

    Comment


    • #3
      Hi Clyde,

      Your guidance is much appreciated. Please find an extract of the data below to help visualise the problem. In terms of end product, ideally I would like to see a binary indicator for each observation to indicate whether the other test (i.e. testid 12 or 147) required appears within 3 months, and also for each subject, whether they have at least one pair of tests within a 3-month period.

      I'm glad to hear there is a more efficient method than looping through each observation, hence I eagerly await your insights.

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input long(subjectid testid) double testdate
      1001  12 18499
      1001  12 18528
      1015  12 18568
      1018  12 19845
      1018 147 19829
      1021  12 18941
      1092  12 18674
      1092  12 18819
      1092 147 19452
      1092 147 18750
      1092  12 19247
      1101 147 19529
      1132 147 19578
      1138  12 19085
      1168  12 18708
      1171  12 18497
      1171  12 19663
      1171 147 19915
      1171  12 18526
      1171 147 19863
      1171 147 18659
      1171  12 19464
      1171  12 18625
      1171  12 19306
      1171 147 18680
      1171  12 19557
      1171  12 18723
      1171  12 19415
      1171  12 19612
      1171  12 19396
      1171  12 19838
      1171  12 19604
      1171  12 19508
      1171  12 19269
      1171  12 19344
      1171  12 19760
      1171  12 18448
      1171 147 18744
      1171 147 18470
      1171  12 18598
      1171 147 18575
      1171 147 19302
      1194 147 20458
      1194  12 19561
      1194 147 20241
      1194  12 19141
      1220  12 19822
      1220 147 19775
      1226  12 19590
      end
      format %td testdate

      Thanks in advance,

      Rob.

      Comment


      • #4
        I believe the following achieves your goals:

        Code:
        clonevar date_12 = testdate if testid == 12
        clonevar date_147 = testdate if testid == 147
        
        rangestat (count) had_12_within_3mos = date_12, by(subjectid) interval(testdate 0 90) excludeself
        rangestat (count) had_147_within_3mos = date_147, by(subjectid) interval(testdate 0 90) excludeself
        
        gen byte both_within_3_mos = (testid == 12 & had_147_within_3mos > 0) ///
            | (testid == 147 & had_12_within_3mos > 0)
        by subjectid, sort: egen byte ever_had_both_within_3_mos = max(both_within_3_mos)
        -rangestat- is written by Robert Picard, Nick Cox, and Roberto Ferrer and is available from SSC.

        I am interpreting "within 3 months" to mean on the same date or up to 90 days after. If you want from 45 days before to 44 days after, change the -interval()- observation to read -interval(testdate -45 44)- Similarly, if you had any other particular 3 month period in mind, adjust accordingly.

        At the end of this code there will be 6 new variables. date_12 and date_147 are not of further use: they are just the same as testdate in the case that testid == 12 and 147, respectively, otherwise missing. had_12_within_3_mos will be a count of the number of observations on the same subjectid showing testid == 12 and a testdate within 3 months of the current date. had_147_within_3mos is analogous. The variable both_within_3_mos will be 1 for any observation where the same patient has both tests within 3 months of the date in that observation, 0 otherwise. And ever_had_both_within_3_mos will be 1 in all observations of a patient who, at any date in the data set, has had both tests within a 3 month period.
        Last edited by Clyde Schechter; 28 Feb 2021, 14:05.

        Comment


        • #5
          Thank you Clyde - your solution makes complete sense and is clearly much more efficient than my first thoughts.

          Out of curiosity, could the above code be easily adapted to identify any pair of tests conducted within a 3-month period, i.e. two tests with different IDs, not just limited to 12 and 147?

          Thanks in advance,

          Rob.

          Comment


          • #6
            That depends on what you mean. Evidently, there is nothing special about the numbers 12 and 147, and you could easily change them to any two distinct values of interest. And if there were only a small number of different tests in all, then you could just do the code shown in #4 for each pair among them.

            But if there is a large number of different tests and you are interested in knowing about any possible pairing of them within 3 months, it's a little different.
            Code:
            tempfile copy
            save `copy'
            
            gen long obs_no = _n
            rangejoin testdate 0 90 using `copy', by(subjectid)
            replace testdate_U = . if testid_U == testid_U
            by obs_no, sort: egen pairs_within_3_mos = count(testdate)
            by obs_no: keep if _n == 1
            drop *_U
            by subjectid, sort: egen ever_some_pair_within_3_mos = max(pairs_within_3_mos > 0)
            -rangejoin- is written by Robert Picard and is available from SSC. It requires -rangestat-, which, by now, I assume, you have already installed.

            At the end of this code you have two new variables. pairs_within_3_mos, in any observation tells you how many tests this patient had, whose testid differs from that of the current observation, within 3 months. And ever_some_pair_within_3_mos tells you for each patient whether or not they ever have a pair of tests with different testids within a 3 month period.

            Comment


            • #7
              Thank you for elaborating on your original solution Clyde. To confirm, presumably line 6 of your code should instead read:

              Code:
              replace testdate_U = . if testid == testid_U

              Comment


              • #8
                Yes, that's right. Sorry for the error. (I created that code in the do-editor. And I think that I originally ended it with -if tetid_U == testid-, and then, without my noticing, Stata autocompleted testid to testid_U.)

                Comment

                Working...
                X