Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to use a value that is closer to a certain date

    Hi,

    I would like to obtain a variable with a certain lab value (GFR) that is closest to a specific date (date of blooddraw). I have 1 to 4 different GFRs per observation and would like to use the one that is closest to the blooddraw.
    The format of the data looks like this :
    Id troughleveldate rcreatdate1 GFR1 rcreatdate2 GFR2 rcreatdate3 GFR3 rcreatdate4 GFR4
    1 04feb2019 02feb2019 100.45 04feb2019 89.76 . . . .
    2 08mar2019 06mar2019 79.41 07mar2019 80.07 08mar2019 94.2 . .
    Do you think there is a way to generate a GFR that is closest to troughleveldate?

    Thank you in advance,

    Mia Lidén

  • #2
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte id float(troughleveldate rcreatdate1 gfr1 rcreatdate2 gfr2 rcreatdate3 gfr3) byte(rcreatdate4 gfr4)
    1 21584 21582 100.45 21584 89.76     .    . . .
    2 21616 21614  79.41 21615 80.07 21616 94.2 . .
    end
    format %td troughleveldate
    format %td rcreatdate1
    format %td rcreatdate2
    format %td rcreatdate3
    
    
    reshape long rcreatdate gfr, i(id)
    format rcreatdate %td
    gen delta = abs(rcreatdate - troughleveldate)
    by id (delta rcreatdate), sort: gen wanted = gfr[1]
    is a generic solution to the question you have posed. In your example data, there is always a GFR whose date matches the trough level date. So that is always the one that gets picked here. It dawns on me that what you actually want is the one that is closest to the trough level date, but not actually on that date. That solution would be a bit different. Also, in your real data there may be cases where there are two blood draws on dates that are tied for closest to the trough, one preceding and the other following by the same number of days. The code above will give you the one that precedes.

    Note that, as with so many things in Stata, this task, which is complicated to do with data in wide layout, is pretty simple once the data is -reshape-d to long. It is likely that whatever else you will be doing with this data will also be easier, or even only possible, in long layout. So I have left the data that way. In the event that you know for a fact that you will be doing things that require wide layout, you can return to that by just running -reshape wide- after the end of the code shown.

    In the future, when showing data examples, please use the -dataex- command to do so, as I have here. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Last edited by Clyde Schechter; 29 May 2023, 09:58.

    Comment


    • #3
      Hi Clyde,

      Thanks a lot. Yes it was helpful to put the data into long format. Would the code change if there are more than 2 dates to choose from (and therefore for than one GFR)?

      Noted for the data examples, thanks for the tip.

      Best,

      Mia

      Comment


      • #4
        Would the code change if there are more than 2 dates to choose from (and therefore for than one GFR)?
        Well, given that time passes one-dimensionally, there can never be more than 2 dates that qualify as nearest to the trough date: one before and one after, that's it. But, of course, there could be more than one eGFR measurement on those dates, and in that case there could be more than 2 eGFRs in contention. You would need to specify how you want to handle a situation like that. Would you select one systematically (e.g., the lowest, or the highest, or based on some other information contained in variables not yet part of this discussion)? Or take the mean? The median? Or retain all of them? Something else?

        Comment


        • #5
          Yes I think I see what you mean. Here is an example of my data set. As you can see, I have up to 4 eGFR for some patients. However, some patients only have one value, most have two values and some have three. I would like to find which out of the values is the closest to rtroughdate and in the case of a tie, take the one that is before the rtroughdate. In the code you gave me before, it worked well for patients with one or two values but was wrong for those with three or four. Is there a way of recreating the code to include all eGFRs in the analysis? Thanks in advance for your help!

          * Example generated by -dataex-. For more info, type help dataex
          clear
          input float id byte GFRnum float(rcreacdate GFR1 GFR2 GFR3 GFR4) double rtroughdate
          1 1 21570 31.166265 . . . .
          1 2 21571 . 29.73781 . . .
          2 1 22313 100.45 . . . 1.8648576e+12
          2 2 21584 . 89.76383 . . 1.8648576e+12
          3 1 21589 175.50496 . . . 1.8655488e+12
          3 2 21592 . 192.2197 . . 1.8655488e+12
          4 1 21591 135.31564 . . . 1.8655488e+12
          4 2 21592 . 143.12231 . . 1.8655488e+12
          5 1 21604 112.8375 . . . 1.8667584e+12
          6 1 21608 84.12039 . . . 1.8670176e+12
          6 2 21609 . 96.92133 . . 1.8670176e+12
          7 1 21614 79.41124 . . . 1.8676224e+12
          7 2 21615 . 80.073 . . 1.8676224e+12
          7 3 21616 . . 94.20353 . 1.8676224e+12
          8 1 21626 97.0224 . . . 1.8685728e+12
          8 2 21627 . 95.28986 . . 1.8685728e+12
          9 1 21628 309.5643 . . . 1.8687456e+12
          9 2 21629 . 284.79913 . . 1.8687456e+12
          10 1 21633 107.9261 . . . 1.869264e+12
          11 1 21636 69.51484 . . . 1.8379008e+12
          11 2 21637 . 65.96816 . . 1.8379008e+12
          12 1 21668 42.45364 . . . 1.8379008e+12
          13 1 21649 19.30463 . . . 1.87056e+12
          14 1 21662 48.77766 . . . 1.8717696e+12
          14 2 21665 . 59.9117 . . 1.8717696e+12
          15 1 21664 77.67718 . . . 1.871856e+12
          15 2 21665 . 109.38214 . . 1.871856e+12
          16 1 21670 14.418334 . . . 1.8724608e+12
          16 2 22402 . 12.922656 . . 1.8724608e+12
          17 1 21683 77.80333 . . . 1.8734976e+12
          17 2 21684 . 86.95666 . . 1.8734976e+12
          18 1 21661 61.95719 . . . 1.8741888e+12
          19 1 21702 88.22356 . . . 1.875312e+12
          19 2 21703 . 93.73753 . . 1.875312e+12
          20 1 21716 135.177 . . . 1.8448128e+12
          21 1 21748 159.17647 . . . 1.8765216e+12
          22 1 21729 257.7648 . . . 1.8775584e+12
          22 2 21730 . 27.80553 . . 1.8775584e+12
          23 1 21738 76.77363 . . . 1.8782496e+12
          23 2 21739 . 49.54893 . . 1.8782496e+12
          24 1 21740 108.24 . . . 1.878336e+12
          24 2 21741 . 90.2 . . 1.878336e+12
          25 1 21742 54.9201 . . . 1.878768e+12
          26 1 21771 71.2908 . . . 1.8812736e+12
          27 1 21773 254.53094 . . . 1.8812736e+12
          28 1 21773 148.27452 . . . 1.8812736e+12
          29 1 21782 55.97147 . . . 1.8821376e+12
          29 2 21784 . 54.25806 . . 1.8821376e+12
          30 1 21785 177.04115 . . . 1.8823968e+12
          31 1 21788 146.1482 . . . 1.8825696e+12
          31 2 21789 . 162.09164 . . 1.8825696e+12
          32 1 21789 127.6125 . . . 1.882656e+12
          32 2 21821 . 127.6125 . . 1.882656e+12
          32 3 21796 . . 298.8394 . 1.882656e+12
          33 1 21789 137.99387 . . . 1.8827424e+12
          34 1 21792 172.2496 . . . 1.8830016e+12
          34 2 21794 . 177.99126 . . 1.8830016e+12
          35 1 21803 133.59166 . . . 1.8838656e+12
          36 1 21806 114.15938 . . . 1.8842112e+12
          36 2 21807 . 128.17894 . . 1.8842112e+12
          37 1 21810 137.59117 . . . 1.8844704e+12
          37 2 21811 . 104.73358 . . 1.8844704e+12
          38 1 21849 109.1754 . . . 1.8851616e+12
          39 1 21828 127.8585 . . . 1.8860256e+12
          39 2 21829 . 98.35269 . . 1.8860256e+12
          40 1 21850 113.529 . . . 1.8879264e+12
          40 2 21851 . 124.6593 . . 1.8879264e+12
          41 1 21869 97.77634 . . . 1.8897408e+12
          41 2 21810 . 117.66305 . . 1.8897408e+12
          42 1 21875 218.9488 . . . 1.8902592e+12
          43 1 21877 207.80966 . . . 1.8902592e+12
          43 2 21878 . 207.80966 . . 1.8902592e+12
          44 1 21920 185.44615 . . . 1.8939744e+12
          44 2 21557 . 96.432 . . 1.8939744e+12
          45 1 21920 140.45062 . . . .
          45 2 21921 . 169.60075 . . .
          46 1 21926 63.1894 . . . 1.8944928e+12
          46 2 21927 . 63.96 . . 1.8944928e+12
          46 3 21943 . . 242.8111 . 1.8944928e+12
          47 1 21932 223.9679 . . . 1.8950976e+12
          47 2 21933 . 240.87114 . . 1.8950976e+12
          47 3 21934 . . 271.62064 . 1.8950976e+12
          48 1 21936 112.51572 . . . 1.8953568e+12
          49 1 21938 198.9926 . . . 1.8957024e+12
          49 2 21941 . 208.0377 . . 1.8957024e+12
          50 1 21939 99.55013 . . . 1.8957024e+12
          50 2 21941 . 95.817 . . 1.8957024e+12
          51 1 21945 65.312996 . . . 1.896048e+12
          51 2 21959 . 57.5736 . . 1.896048e+12
          52 1 21958 65.4975 . . . 1.8972576e+12
          53 1 21958 156.60536 . . . 1.8972576e+12
          53 2 21959 . 208.80714 . . 1.8972576e+12
          54 1 21960 110.52428 . . . 1.8975168e+12
          54 2 21962 . 95.22092 . . 1.8975168e+12
          55 1 21968 123.31271 . . . 1.8981216e+12
          55 2 21969 . 134.73056 . . 1.8981216e+12
          56 1 21969 70.71382 . . . 1.898208e+12
          56 2 21970 . 81.02625 . . 1.898208e+12
          57 1 22000 183.4457 . . . 1.8983808e+12
          57 2 21972 . 86.76486 . . 1.8983808e+12
          end
          format %td rcreacdate
          format %tc rtroughdate
          [/CODE]

          Comment


          • #6
            I caN't think of any reason why the code I showed would work with 1 or 2 GFRs but not 3 or more. But the example data you show this time is materially different from what you showed in #1, and the code in #2 no longer works correctly with it, regardless of how many GFRs there are.

            The new data example is now in a strange hybrid between long and wide layout, where you have separate GFR1, GFR2, ... variables, but in any observation all but one are missing. This is not the best of both worlds: it is the worst of both. The other material difference, and it is critical, is that originally the trough date variable was a date variable, but now it is a clock variable. In order to compare it to the GFR date, that has to be converted back to a date variable: a direct comparison of a clock variable with a date variable will give meaningless and useless results, and will lead to incorrect assessments of which GFR date is closest to the trough date. So:

            Code:
            isid id GFRnum, sort
            reshape long GFR, i(id GFRnum)
            drop if missing(GFR)
            drop _j
            gen troughdate = dofc(rtroughdate)
            format troughdate %td
            gen double delta = abs(rcreacdate - troughdate)
            format delta %16.0f
            by id (delta rcreacdate), sort: gen wanted = GFR[1] if !missing(rtroughdate)
            
            order rtroughdate, after(rcreacdate)
            sort id GFRnum

            Comment


            • #7
              That's worked well! Thank you for your time and patience.

              Comment

              Working...
              X