Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Dear prof.Schechter,

    I tried your code , in which that you have assumed that each drug is consumed at a rate of 1 per day in my real data. But there is some problem with my real data; there medications are coded by ATC5 code and some of the medications are combined in a single dosage forms and coded under one ATC5 code. I am having issue with these combinations of medications. I tried the below code from STATA forum which gives name of medication regimens. But in real data it does not work well (it takes more than 24 hours for one extract) and seems still can not deal with combination of medications which coded under one ATC5 code.

    I would really appreciate if you may help me solve the problem.
    Thank you




    clear
    input byte ID str9 startdate str9 lastdate str7 drug

    2 "24-Aug-95" "14-Nov-97" "A01AD05"
    2 "14-Nov-97" "15-Jun-00" "B01AC04"
    3 "14-Nov-97" "13-Feb-03" "C10AA05"
    3 "15-Jun-00" "13-Feb-03" "B01AC04"
    3 "15-Jun-00" "13-Feb-03" "A01AD05"

    end


    generate int calendar_dt1 = date(startdate, "DMY", 2020)
    generate int calendar_dt0 = date(lastdate, "DMY", 2020)
    generate long row_nr = _n

    quietly reshape long calendar_dt, i(row_nr) j(action)

    label define Actions 1 Add 0 Remove

    label values action Actions

    drop startdate lastdate row_nr

    format calendar_dt %tdCCYY-NN-DD

    egen long event_nr = group(calendar_dt)

    gsort +event_nr -action

    order ID event_nr action drug calendar_dt

    list, noobs sepby(event_nr) abbreviate(20)
    generate str drug_regimen = drug if action == "Add":Actions

    replace drug_regimen = trim(itrim(cond(action == "Add":Actions, drug_regimen + " " + drug_regimen[_n-1], subinword(drug_regimen[_n-1], drug, "", 1)))) if _n > 1

    list, noobs sepby(event_nr) abbreviate(20)

    by event_nr: keep if _n == _N

    generate byte drug_tally = wordcount(drug_regimen)

    list ID event_nr calendar_dt drug_regimen drug_tally, noobs abbreviate(20)
    Last edited by Buyadaa Oyunchimeg; 07 May 2018, 20:34.

    Comment


    • #17
      Well, your question isn't clear. Your code runs, and in the example data you show it produces sensible results. You imply that this is not the case in your real data, but you don't explain in what sense that is the case, and your example doesn't illustrate any problems. So I think you need to present an example from your data that illustrates the problems you are encountering. Show that example, show the output you get from this code using that data, and then explain why it isn't what you need.

      I do see one problem. This code treats the entire data set as a single list of drugs being added and discontinued. This makes a lot of sense in the context of one individual, but your code doesn't do this -by individual-, it does it across the data set. Consequently, when a new patient is reached in the data, it doesn't "start over" and it just keeps adding that patient's regimen to the final regimen for the preceding patient. I suppose your data set might actually be about just one person, but if so I cannot understand how it could possibly take 24 hours to run.



      Comment


      • #18

        My apologies for unclear post and thank you for your response. Unfortunately, I couldn't produce any result using the above code in my real data (it runs more than 24 hours then suddenly stops and disappears). The data is big ~25-50 million observations in one extract.

        As I mentioned before, currently I am trying to deal with combination of medication which coded under one ATC5 code. Your code runs perfectly in real data but it counts the combination of medications as one medication thus the result is not what I was expected. Studies or reports which used the same data showed that combination therapy increasing over time (but my current result shows the opposite).

        What would you suggest for dealing with combination of medication coded under one ATC5 code? And is it possible to correct or change the above code's error?

        Thank you

        Comment


        • #19
          As I mentioned before, currently I am trying to deal with combination of medication which coded under one ATC5 code.
          But I don't understand how this becomes a problem. Just because you are not getting results matching somebody else's analysis does not mean that you are getting wrong results for your data. What difference does it make whether a code represents a combination of medications or just a single one? Your code still correctly tracks what drugs are being added and which ones are being discontinued. I don't get it.

          As for the long time, the only command I see in there that would take a long time to execute is the -gsort-, but that seems integral to the calculations, so I can't recommend you skip it. The other thing that will be slow, and also, frankly, pointless, are the -list- commands. That's fine when you're just testing the code on a small subset of the data, but -list-ing 25-50 million observations will take a very long time, and most of it will go out of the screen buffer anyway, and even if you kept it all how could you possibly read it?

          And I don't understand how you get 25-50 million observations of medication orders on a single patient.

          Comment


          • #20
            Thank you for feedback and sorry again for unclear explanation. I've skipped the list command in real data and but when it comes to the below code it's getting very slow.

            replace drug_regimen = trim(itrim(cond(action == "Add":Actions, drug_regimen + " " + drug_regimen[_n-1], subinword(drug_regimen[_n-1], drug, "", 1)))) if _n > 1


            This code treats the entire data set as a single list of drugs being added and discontinued. This makes a lot of sense in the context of one individual, but your code doesn't do this -by individual-, it does it across the data set. Consequently, when a new patient is reached in the data, it doesn't "start over" and it just keeps adding that patient's regimen to the final regimen for the preceding patient. I suppose your data set might actually be about just one person
            .

            I think I gave you wring explanation from the beginning. 25-50 million is overall observation number in one extract and one patient on an average has thousand observations. This is maybe the reason why the codes do not work in real data. Is it possible to modify them for data which has more than 1 million individuals?


            Thank you
            Last edited by Buyadaa Oyunchimeg; 08 May 2018, 20:14.

            Comment


            • #21
              I think this will do it by ID (I assume ID is your patient identifier).

              Code:
              clear
              input byte ID str9 startdate str9 lastdate str7 drug
              
              2 "24-Aug-95" "14-Nov-97" "A01AD05"
              2 "14-Nov-97" "15-Jun-00" "B01AC04"
              3 "14-Nov-97" "13-Feb-03" "C10AA05"
              3 "15-Jun-00" "13-Feb-03" "B01AC04"
              3 "15-Jun-00" "13-Feb-03" "A01AD05"
              
              end
              
              
              generate int calendar_dt1 = date(startdate, "DMY", 2020)
              generate int calendar_dt0 = date(lastdate, "DMY", 2020)
              generate long row_nr = _n
              
              quietly reshape long calendar_dt, i(row_nr) j(action)
              
              label define Actions 1 Add 0 Remove
              
              label values action Actions
              
              drop startdate lastdate row_nr
              
              format calendar_dt %tdCCYY-NN-DD
              
              capture program drop one_id
              program define one_id
                  gsort calendar_dt -action
                  order ID  calendar_dt action drug
                  generate str drug_regimen = drug if action == "Add":Actions
                  replace drug_regimen = trim(itrim(cond(action == "Add":Actions, ///
                      drug_regimen + " " + drug_regimen[_n-1], ///
                      subinword(drug_regimen[_n-1], drug, "", 1)))) if _n > 1
                  by calendar_dt: keep if _n == _N
                  generate byte drug_tally = wordcount(drug_regimen)
                  exit
              end
              
              runby one_id, by(ID) status
              
              list, noobs clean
              Note: To do this you need the -runby- command, written by Robert Picard and me, and available from SSC.

              I have eliminated the variable event_nr. It serves no real purpose in the code: it is just a recoding in consecutive integers starting at 1 of calendar_dt. But you can just use calendar_dt for everything you did with event_nr. Since there are probably a lot of calendar dates in the data, eliminating the creation of event_nr should make a noticeable, though not enormous, improvement in the executioni time. But you are still stuck with having to sort the data (-runby- does this internally) so it is going to be slow. Because of the -status- option, -runby- will update you periodically about its progress and how much longer it thinks it will take to finish.

              The -list- command at the end is just to enable you to see what happens with this example. With your real data, involving millions of observations, you should not attempt to -list- everything. As noted in an earlier post in this thread, it would be pointless and would waste a lot of time. You can -browse- the data set if you wish to visually check some of your results.

              Comment


              • #22
                Dear prof. Schechter,

                Thank you so much. I will try it in my real data and let you know how it went.

                Comment


                • #23
                  Dear prof. Schechter,

                  I've tried the codes in real data. Generally, it works well but I got a problem with drug_regimen and drug_tally. For example, ID 2 had a drug regimen "A01AD05" in 1997 (single medication), but there are duplicates in drug regimen (A01AD05 A01AD05=> drug_tally=2). I've tried to remove duplicates and generate new variable "number of medications" (single, dual, >= triple therapy) but I couldn't.

                  Is it possible to remove duplicates from drug_regimen and generate new variable which shows the number of medications for each individuals and what would you suggest?

                  Thank you
                  PHP Code:
                  list, noobs clean

                      ID   calendar
                  ~t   action      drug              drug_regimen   drug_t~y  
                       2   1995
                  -08-24      Add   A01AD05                   A01AD05          1  
                       2   1997
                  -08-24      Add   A01AD05           A01AD05 A01AD05          2  
                       2   1997
                  -11-14   Remove   A01AD05           B01AC04 A01AD05          2  
                       2   1998
                  -11-14   Remove   A01AD05                   B01AC04          1  
                       2   1999
                  -11-14      Add   B01AC04           B01AC04 B01AC04          2  
                       2   2000
                  -06-15   Remove   B01AC04                                    0  
                       2   2001
                  -11-14      Add   B01AC04                   B01AC04          1  
                       2   2002
                  -08-24      Add   A01AD05           A01AD05 B01AC04          2  
                       2   2006
                  -06-15   Remove   B01AC04                   A01AD05          1  
                       2   2007
                  -11-14   Remove   A01AD05                                    0  
                       3   1997
                  -11-14      Add   C10AA05                   C10AA05          1  
                       3   2000
                  -06-15      Add   B01AC04           B01AC04 C10AA05          2  
                       3   2001
                  -06-15      Add   A01AD05   A01AD05 B01AC04 C10AA05          3  
                       3   2003
                  -02-13   Remove   C10AA05                                    0  
                       3   2004
                  -02-13   Remove   A01AD05                                    0  
                       3   2004
                  -06-15      Add   B01AC04           B01AC04 A01AD05          2  
                       3   2007
                  -02-13   Remove   B01AC04                   A01AD05          1



                  -dataex
                  clear
                  input byte ID str9 startdate str9 lastdate str7 drug

                  "24-Aug-95" "14-Nov-97" "A01AD05"
                  "14-Nov-97" "15-Jun-00" "B01AC04"
                  "24-Aug-97" "14-Nov-98" "A01AD05"
                  "14-Nov-99" "15-Jun-00" "B01AC04"
                  "24-Aug-02" "14-Nov-07" "A01AD05"
                  "14-Nov-01" "15-Jun-06" "B01AC04"
                  "14-Nov-97" "13-Feb-03" "C10AA05"
                  "15-Jun-00" "13-Feb-03" "B01AC04"
                  "15-Jun-01" "13-Feb-03" "A01AD05"
                  "15-Jun-04" "13-Feb-07" "B01AC04"
                  "15-Jun-04" "13-Feb-04" "A01AD05"
                  end 
                  Last edited by Buyadaa Oyunchimeg; 15 May 2018, 00:13.

                  Comment


                  • #24
                    So, it's a relatively minor modification to the earlier code, although it looks a bit complex syntactically because it entails embedding one -cond()- expression inside another. But the gist of it is simply to only add the drug to the regiment if it does not already appear in the regimen.

                    Code:
                    generate int calendar_dt1 = date(startdate, "DMY", 2020)
                    generate int calendar_dt0 = date(lastdate, "DMY", 2020)
                    generate long row_nr = _n
                    
                    quietly reshape long calendar_dt, i(row_nr) j(action)
                    
                    label define Actions 1 Add 0 Remove
                    
                    label values action Actions
                    
                    drop startdate lastdate row_nr
                    
                    format calendar_dt %tdCCYY-NN-DD
                    
                    capture program drop one_id
                    program define one_id
                        gsort calendar_dt -action
                        order ID  calendar_dt action drug
                        generate str drug_regimen = drug if action == "Add":Actions
                        replace drug_regimen = cond(strpos(drug_regimen[_n-1], drug_regimen) == 0 & action == "Add":Actions, ///
                            drug_regimen + " " + drug_regimen[_n-1], ///
                            cond(action == "Remove":Actions, subinword(drug_regimen[_n-1], drug, "", .), drug_regimen[_n-1]))
                        replace drug_regimen = trim(itrim(drug_regimen))
                        by calendar_dt: keep if _n == _N
                        generate byte drug_tally = wordcount(drug_regimen)
                        keep ID calendar_dt drug_regimen drug_tally
                        exit
                    end
                    
                    runby one_id, by(ID) status 
                    
                    sort ID calendar_dt
                    list, noobs clean
                    Notes: Again, don't attempt the final -list- command in your real data. Just -browse- or -save- the results. The -keep- command added at the end was added because on reviewing the output I realized that there were some variables that had become irrelevant once we reduced to a single observation per date and they looked confusing.

                    Comment


                    • #25
                      Thank you so much prof. Schechter.

                      Comment


                      • #26
                        Dear prof. Schechter,

                        Thank you again for your help, codes worked well in real data. Now I am trying to calculate the proportions of single, dual and triple therapy by year and again having difficulty due to lack of knowledge. Is there anyway to calculate the proportions by year using loops? I've tried to do this by generating new variables, but it seems not accurate.


                        Sincerely,
                        Oyun

                        Comment


                        • #27
                          So, again, please show an example of the data you will be using for this. And also define what you mean by single, dual, and triple therapy in terms of information that is available in that data.

                          Comment


                          • #28
                            My data looks as below. I am running the analysis by separate medication classes, for example the current one includes only anti-thrombotic medications. I would like to calculate the proportions of individuals on single anti-thrombotic medication or dual or triple anti-thrombotic medications by year using drug_tally variable.

                            Because one individual has many duplicates in terms of ID it seems difficult to accurately calculate the the proportions by year. What would you suggest?

                            PHP Code:
                            clear
                            input byte ID int calendar_dt str19 drug_regimen byte drug_tally float year
                            2 13019 
                            "A01AD05"         1 1995
                            2 13819 
                            "B01AC05 A01AD05" 2 1997
                            2 13832 
                            "B01AC05"         1 1997
                            2 13863 
                            ""                0 1997
                            2 17037 
                            "B01AC05"         1 2006
                            2 17117 
                            "B01AC05"         1 2006
                            2 17119 
                            ""                0 2006
                            2 17150 
                            ""                0 2006
                            3 13832 
                            "C10AA05"         1 1997
                            3 13923 
                            ""                0 1998
                            3 14776 
                            "B01AC04"         1 2000
                            3 14835 
                            ""                0 2000
                            3 15141 
                            "B01AC04"         1 2001
                            3 15200 
                            ""                0 2001
                            3 16237 
                            "B01AC04"         1 2004
                            3 16262 
                            "C01AC04 B01AC04" 2 2004
                            3 16265 
                            "C01AC04"         1 2004
                            3 16296 
                            ""                0 2004
                            end
                            format 
                            %tdCCYY-NN-DD calendar_dt 

                            Comment


                            • #29
                              This would ordinarily do it:
                              Code:
                              gen multiplicity = wordcount(drug_regimen)
                              tab multiplicity year, col
                              Because one individual has many duplicates in terms of ID it seems difficult to accurately calculate the the proportions by year. What would you suggest?
                              I suggest you fix the data so that each individual has a unique ID. The fact that you know about this problem implies that there is some other source that enables you to identify individuals and their associated IDs in this data set. Use that same other source to fix the data you are working with.

                              Comment


                              • #30
                                Thank you so much prof. Schechter.

                                Comment

                                Working...
                                X