Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Forvalues `i+1` and wide to long dataset

    Dear Statalisters,

    I am making a Counting process (long format) dataset from routine surveillance data from a Drug Resistant TB database. Patients are put on drug regimens consisting of multiple drugs at different time points. The surveillance system captures this as drug* and drugstartdate*. Each variable may be for multiple different drugs but starting on the same date. I wish to concatenate the drugnames into one list on the condition the drugstartdates are the same. This can then be repeated for proceeding drug regimens.

    I am hoping for the end result to be something like:
    drugregimen1 = "bedaquiline; Isiniozid; B6" drugregimenstart1 ="14 Jan 2022"
    drugregimen2 = "Lenozalid; bedaquiline; High Dose INH" drugregimenstart2= "20 Oct 2022"

    Initially I have rudimentary code :

    Code:
    egen drugreg1 = concat(drugname1 drugname2 drugname3 ) if drugstartdate1==drugstartdate2 &drugstartdate1==drugstartdate3, punct(;)
    on data

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str23 drugname1 int drugstartdate1 str23 drugname2 int drugstartdate2 str23 drugname3 int drugstartdate3
    "Bedaquiline"  22573 "Levofloxacin" 22573 "Linezolid"   22573
    "Bedaquiline"  22581 "Levofloxacin" 22581 "Linezolid"   22581
    "Bedaquiline"  22824 "Levofloxacin" 22824 "Linezolid"   22824
    "Bedaquiline"  22816 "Levofloxacin" 22816 "Linezolid"   22816
    "Bedaquiline"  22806 "Levofloxacin" 22806 "Linezolid"   22806
    "Levofloxacin" 22494 "Bedaquiline"  22494 "Clofazimine" 22494
    "Linezolid"    22307 "Bedaquiline"  22307 "Terizidone"  22307
    "Bedaquiline"  22564 "Levofloxacin" 22564 "Linezolid"   22564
    "Bedaquiline"  22540 "Levofloxacin" 22540 "Linezolid"   22540
    "Bedaquiline"  22839 "Levofloxacin" 22839 "Linezolid"   22839
    end
    format %td drugstartdate1
    format %td drugstartdate2
    format %td drugstartdate3
    which returns

    Code:
    +--------------------------------------+
         |                             drugreg1 |
         |--------------------------------------|
      1. |   Bedaquiline;Levofloxacin;Linezolid |
      2. |   Bedaquiline;Levofloxacin;Linezolid |
      3. |   Bedaquiline;Levofloxacin;Linezolid |
      4. |   Bedaquiline;Levofloxacin;Linezolid |
      5. |   Bedaquiline;Levofloxacin;Linezolid |
         |--------------------------------------|
      6. | Levofloxacin;Bedaquiline;Clofazimine |
      7. |     Linezolid;Bedaquiline;Terizidone |
      8. |   Bedaquiline;Levofloxacin;Linezolid |
      9. |   Bedaquiline;Levofloxacin;Linezolid |
     10. |   Bedaquiline;Levofloxacin;Linezolid |
         +--------------------------------------+
    So far so good but the problems start now.

    The rudimentary code won't stop if the drugstardate changes and it will stop if there are differing numbers of drugs in each regimen. For instance when there are missing variables or changes in drug startdate:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str23 drugname6 int drugstartdate6 str21 drugname7 long drugstartdate7 str23 drugname8 int drugstartdate8 str23 drugname9 int drugstartdate9 str21 drugname10 int drugstartdate10 str23 drugname11 int drugstartdate11
    "Pyrazinamide"  22573 "High dose INH" 22573 ""                . ""                 . ""                . ""               .
    "Pyrazinamide"  22581 "High dose INH" 22581 ""                . ""                 . ""                . ""               .
    "Pyrazinamide"  22824 "High dose INH" 22824 "Bedaquiline" 22831 "Levofloxacin" 22831 "Clofazimine" 22831 "Terizidone" 22831
    "Pyrazinamide"  22816 "High dose INH" 22816 "PYRIDIXINE"  22816 ""                 . ""                . ""               .
    "Pyrazinamide"  22806 "High dose INH" 22806 "B6"          22806 ""                 . ""                . ""               .
    "Ethambutol"    22494 "Pyrazinamide"  22494 ""                . ""                 . ""                . ""               .
    ""                  . ""                  . ""                . ""                 . ""                . ""               .
    "High dose INH" 22564 ""                  . ""                . ""                 . ""                . ""               .
    "Pyrazinamide"  22540 "High dose INH" 22540 "Pyridoxine"  22540 "Bco"          22540 ""                . ""               .
    "Pyrazinamide"  22839 "High dose INH" 22839 "PYRIDOXINE"  22839 ""                 . ""                . ""               .
    end
    format %td drugstartdate6
    format %td drugstartdate7
    format %td drugstartdate8
    format %td drugstartdate9
    format %td drugstartdate10
    format %td drugstartdate11

    I have tried a loop :
    Code:
    capture drop drugreg*
    forval i=1/5{
        egen drugreg`i'= concat(drugname`i' drugname`i+1' drugname`i+2') if drugstartdate`i' ==drugstartdate`i+1' & drugstartdate`i+2', punct(;)
    }
    list drugreg1 drugreg2 drugreg3 in 1/10
    This doesnt work, Stata doesnt seem to recognise the `i+1'

    I have tried a more manual version
    Code:
    order drugname*
    order drugstartdate*
    capture drop drugreg1
    egen drugreg1 = concat(drugname1-drugname56) if drugstartdate1<=drugstartdate2 & drugstartdate1<=drugstartdate3 & drugstartdate1<=drugstartdate4 & drugstartdate1<=drugstartdate5 & drugstartdate1<=drugstartdate6 & drugstartdate1<=drugstartdate7 & drugstartdate1<=drugstartdate8 & drugstartdate1<=drugstartdate9 & drugstartdate1<=drugstartdate10, punct(;)
    list drugreg1 in 1/10
    but this code simply lumps all the drugs that the patient has ever been on.

    If anyone has an idea on an elegant piece of code that will be able to lump the drug regimens together by their corresponding drugstartdates I would be most grateful...

    Kind Regards
    Brian Brummer

  • #2
    It doesn't because you need `i'

    Comment


    • #3
      A quick fix for your code, at least: you need to replace the `i+1' with `=`i'+1', and similarly for `i+2'.

      Comment


      • #4
        But also, a clarification: can at most three drugs start on the same date?

        Comment


        • #5
          I'm not sure I understand your data, nor what you want. There is no variable identifying patients, so you want to collapse together all drugs started on a given date as a regimen, even though they aren't necessarily given to the same patient? Hard for me to understand how that is useful. Also, in your example data, it is always the case that (unless missing) the drugstartdate* values are always the same within an observation. Yet despite this regularity, in your sample code you test for equality of drugstartdate1, drugstartdate2 and drugstartdate3. So I suppose I should assume that the equality of all the drugstartdate* values in the example data is just a fluke. Also, your second block of code (under "I have tried a loop") suggests that you want to limit any regimen to at most three drugs, and if there are more than three listed, you want to count drugs 2, 3, and 4 as a new regimen, and 3, 4, and 5 as a new regiment, etc. All of this seems strange to me.

          Anyway, maybe I can point you in a helpful direction with the following code:
          Code:
          gen long obs_no = _n
          reshape long drugname drugstartdate, i(obs_no)
          sort drugstartdate drugname
          drop if missing(drugname)
          
          by drugstartdate (drugname), sort: gen regimen = drugname if _n == 1
          by drugstartdate (drugname): replace regimen = regimen[_n-1] ///
              + "; " + drugname if _n > 1
          collapse (last) regimen, by(drugstartdate)
          This will create a variable, regimen, that lists all of the drugs prescribed on a given date, separated by semicolons. It will give one such observation for each date. As I suspect that there is, in fact, a patient variable in your data and that you really want this done separately for each patient, the code will have to be modified accordingly.

          Finally, I will just note, reflecting the title of your post, that `i+1' is not proper syntax for the current value of i plus 1. The correct syntax is ``i'+1'.
          Last edited by Clyde Schechter; 31 Aug 2022, 14:36.

          Comment


          • #6
            How about something like this? (note: I am creating a simple id variable to identify each patient by the observation number)

            Code:
            gen id = _n
            reshape long drugname drugstartdate, i(id) j(num)
            drop if missing(drugname)
            
            gen drugregimen = ""
            sort id drugstartdate drugname
            by id drugstartdate: replace drugregimen = drugregimen[_n-1] + ";" + drugname
            by id drugstartdate: replace drugregimen = drugregimen[_N]
            replace drugregimen = substr(drugregimen,2,.)
            
            drop drugname num
            duplicates drop id drugstartdate, force
            bysort id (drugstartdate): gen regnum = _n
            rename drugstartdate drugregimenstart
            reshape wide drugregimen drugregimenstart, i(id) j(regnum)
            This produces:

            Code:
            . list, noobs clean
            
                id   drugre~t1                                drugregimen1   drugre~t2                                      drugregimen2  
                 1   20oct2021                  High dose INH;Pyrazinamide           .                                                    
                 2   28oct2021                  High dose INH;Pyrazinamide           .                                                    
                 3   28jun2022                  High dose INH;Pyrazinamide   05jul2022   Bedaquiline;Clofazimine;Levofloxacin;Terizidone  
                 4   20jun2022       High dose INH;PYRIDIXINE;Pyrazinamide           .                                                    
                 5   10jun2022               B6;High dose INH;Pyrazinamide           .                                                    
                 6   02aug2021                     Ethambutol;Pyrazinamide           .                                                    
                 8   11oct2021                               High dose INH           .                                                    
                 9   17sep2021   Bco;High dose INH;Pyrazinamide;Pyridoxine           .                                                    
                10   13jul2022       High dose INH;PYRIDOXINE;Pyrazinamide           .
            Last edited by Hemanshu Kumar; 31 Aug 2022, 14:53.

            Comment


            • #7
              Hemanshu Kumar Thanks for the reply.

              No, Any number of drugs can be in a single regimen so the code would need to account for the fact that as soon as it finds a drugstartdate that is either "." or more than the preceding drugstartdate it should stop concatenating other drugnames.

              Your suggested code works (thank you for the correction) but I am still struggling getting it to identify either missing drugstartdates or drugstartdates that are later. I can probably get about 2 or three different regimens but it will be messy and there are likely to be mistakes....Dataex doesnt really allow me to show you exactly how wide the dataset is....

              Comment


              • #8
                Brian Brummer the solution in #6 permits any number of drugs to start on the same date (indeed in id#3, regimen#2, we have 4 drugs).

                Do let me know if this suffices or falls short on some account.

                Comment


                • #9
                  Also a minor suggestion: format can handle multiple variables, so you can change the display format of all your drugstartdate variables by simply doing
                  Code:
                  format %td drugstartdate*

                  Comment


                  • #10
                    On a tangent: the reference `i+1' would if it were legal be a reference to the local macro with name i+1 -- except that + is not allowed in Stata names, of local macros or anything else.

                    What is going on is a little more complicated. First off, if you try to assign to local macro i+1 Stata doesn't complain

                    Code:
                    . local i+1 frog
                    but a listing with mac list shows a local macro named i

                    Code:
                    . mac list
                    _i:             +1 frog
                    So what happened there? Stata stopped reading the name at + (reasoning that the character can't be part of a name) and assigned the entire remaining text to a local macro named i.

                    Now, in reverse: what happens if you refer to a macro in that fashion? It seems dependent on whether a local macro exists.

                    Code:
                    . di "`i+1'"
                    +1 frog
                    
                    . di "`j'+1" 
                    +1 
                    Either way, what Brian wants is to increment the value of an existing local macro on the fly, and as Hemanshu Kumar explained in #3 that needs a quite different syntax.

                    Comment


                    • #11
                      Thank you for the tangen Nick Cox.

                      Hemanshu Kumar it seems your solution has worked best for what I was looking for, thank you.
                      Last edited by Brian Brummer; 02 Sep 2022, 01:16.

                      Comment


                      • #12
                        I get this comment isn't adding much to this particular discussion, but it's certainly possible to parse the respective drug regimens, creating further regimen variables. I suppose doing so would require analysis of the data's end use, though allowing each drug to occupy its own variable paves the way for greater possibilities in the statistical realm.
                        Last edited by Eric Makela; 04 Sep 2022, 17:19.

                        Comment


                        • #13
                          Eric Makela that is indeed what is being done. I am making observation periods relating to patients blood and ECG results to look at whether prescription of Goup A (Linozelid, Bedaquilinedrugs is being continued when contraindicated ie. Is Linozelid being presribed even when the patient is anemic. Hoping a cox model with/without tvc can then see the effect of this on mortality/loss to follow up.

                          Cheers

                          Comment

                          Working...
                          X