Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating additional rows to match dates in column 2 and 3

    Hi,
    I am working on recurrent events analysis. I have to merge two data sets. The first data set has ID variable as well as t0 (time at the beginning of risk) and t1 (time at event). The second data set has dates of supply of relevant medication. Each ID variable has multiple records in both datasets.

    DataSet 1
    ID t0 t1 Record
    A 01 Jan 2017 01 Jun 2017 1
    A 01 Jun 2017 31 Dec 2017 2
    B 01 Feb 2018 31 Dec 2018 1
    B 31 Dec 2018 30 Jun 2019 2
    DataSet 2
    ID Date of Supply
    A 01 Nov 2016
    A 01 Jan 2017
    A 01 Sept 2017
    A 01 Oct 2017
    B 01 Jan 2018
    B 01 Sept 2018
    B 01 May 2019
    B 01 July 2019
    B 01 Sept 2019
    When I merge 2 datasets either by joinby or m:m merge , I get the following (after deleting _merge variable).
    ID t0 t1 Record Date of Supply
    A 01 Jan 2017 01 Jun 2017 1 01 Nov 2016
    A 01 Jun 2017 31 Dec 2017 2 01 Jan 2017
    A 01 Jun 2017 31 Dec 2017 2 01 Sept 2017
    A 01 Jun 2017 31 Dec 2017 2 01 Oct 2017
    B 01 Feb 2018 31 Dec 2018 1 01 Jan 2018
    B 31 Dec 2018 30 Jun 2019 2 01 Sept 2018
    B 31 Dec 2018 30 Jun 2019 2 01 May 2019
    B 31 Dec 2018 30 Jun 2019 2 01 July 2019
    B 31 Dec 2018 30 Jun 2019 2 01 Sept 2019
    As I am interested in cumulative dose of medication for each Record, I am keen for guidance on
    a. The best way to match the Date of Supply to Record

    I thank you in advance.

    Regards,
    Sateesh

  • #2
    None of the data examples you show mentions anything about dose, so I don't know how you plan to go about this. That said, if you wanted, for example, to get the number of times medication was prescribed in each of the intervals from data set 1, the code would be like this:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str2 id str12(t0 t1) byte record
    "A " "01 Jan 2017 " "01 Jun 2017 " 1
    "A " "01 Jun 2017 " "31 Dec 2017 " 2
    "B " "01 Feb 2018 " "31 Dec 2018 " 1
    "B " "31 Dec 2018 " "30 Jun 2019 " 2
    end
    foreach v of varlist t0 t1 {
        gen _`v' = daily(`v', "DMY"), before(`v')
        assert missing(_`v') == missing(`v')
        drop `v'
        format _`v' %td
        rename _`v' `v'
    }
    tempfile dataset1
    save `dataset1'
    
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str2 id str12 dateofsupply
    "A " "01 Nov 2016"
    "A " "01 Jan 2017"
    "A " "01 Sept 2017"
    "A " "01 Oct 2017"
    "B " "01 Jan 2018"
    "B " "01 Sept 2018"
    "B " "01 May 2019"
    "B " "01 July 2019"
    "B " "01 Sept 2019"
    end
    gen _dateofsupply = daily(dateofsupply, "DMY"), before(dateofsupply)
    assert missing(_dateofsupply) == missing(dateofsupply)
    drop dateofsupply
    format _dateofsupply %td
    rename _dateofsupply dateofsupply
    
    tempfile dataset2
    save `dataset2'
    
    use `dataset1', clear
    rangejoin dateofsupply t0 t1 using `dataset2', by(id)
    collapse (count) n_rxs = dateofsupply, by(id record)
    format n_rxs %1.0f
    Notes:

    1. -merge m:m- produces data salad. You should never use it. If you think you need to use it, it just means that you do not understand your data correctly. I have been using Stata for just short of 27 years now, and in all that time I only once encountered a situation where what it does it would have produced a correct result--and even in that case there was a better way to do it. So never use -merge m:m-.
    2. In this case, you could proceed with -joinby-, but -rangejoin- is better suited to the task at hand.
    3. -rangejoin- is written by Robert Picard and is available from SSC. To use -rangejoin- you also need to install -rangestat-, but Robert Picard, Nick Cox, and Roberto Ferrer, also available from SSC.
    4. None of this will work unless your date variables are actual Stata internal format date variables--if they are strings, the code will break, and there is no code that will work correctly with string dates. My code above assumes your dates are strings, so it makes the conversions. Evidently, if they are already Stata internal format date variables, then you can skip those conversions.
    5. If you had posted your example data the most helpful way, with the -dataex- command, I wouldn't have to guess whether your variables are string or real date variables. In the future, when showing data examples, please use the -dataex- command to do so, as I have here. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Thank you very much. Very much appreciate your advice and help.

      Comment

      Working...
      X