Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Date Manipulation

    Hi All,

    I have an interesting data set that stores observations at what I am going to call "Event level" i.e. for one patient I have data that looks like this:

    Subject..........Event..........Start Date..........End Date
    1 ..................A ...............12/06/2014 ........25/12/2014
    1 ..................B ...............26/08/2014 ........03/02/2015
    1 ..................C ...............18/11/2014 ........30/04/2015
    1 ..................D ...............08/03/2014 ........15/08/2015

    I want to transform this data into what I am going to call "Subject level" data i.e. all this event information in one row/observation like below:

    Subject..........Event 1..........Event 1 (duration)..........Event 2..........Event 2 (duration)..........Event 3..........Event 3 (duration) etc....
    1................... A only.......... 75 days........................ A & B ...........84 days .......................A & B & C .....37 days................etc...

    But obviously I am going to have different combinations of events and dates per subject so the aim is to write some generalisable or all-encompassing code. I can manipulate the dates into "days since" variables to make them easier to use, but I've been racking my brain for days as to how I may construct a loop or several nested loops to reshape my data.

    Any ideas or advice would be greatly appreciated.

    Thanks,

    Rob.
    Last edited by Rob Wood; 25 Jun 2015, 09:08.

  • #2
    Hi Rob,

    besides the fact that I don't see of what use this would be: You want the event variables to be strings and the duration variables to be metric, right? The order of the events is random, i. e. any event can be the first, second...?

    Comment


    • #3
      Hi Paul,

      Yes the events would be string and the durations numerical. The order of events is totally random and can have overlapping elements to it.

      The reason I want it in this form, is so I can then stick it into a Pivot Table in Excel and be able to flick through the different paths of events.

      Thanks,

      Rob.

      Comment


      • #4
        In this case I would recommend:

        1. Generate variables that give you all the information that you need in a simple form. For instance one that contains the duration for each period.
        2. Generate a new file with the right variable structure.
        3. For each subject id look for the first,second... period (by calendar start date) in the original file. Then save the duration date in the right cell in the new file. This needs to be done in a forval loop that opens, saves, and closes both files after every single operation.
        4. For the strings, have a look on how Stata handles strings and build them "on the way" in the loops as macros.

        This is only very schematic but I hope that it gives you an idea of how this is done. At least that's my way of performing complex restructurings in Stata.

        Comment


        • #5
          I am not motivated personally by Excel purposes, but this may help your analyses in Stata.

          What's tricky in any software is how far you hold data on changes in state and how far on the spells between.

          The main Stata trick here is written up in http://www.stata-journal.com/article...article=dm0068 That's behind a paywall but searching here for dm0068 is likely to throw other threads.

          Code:
          . clear
          
          . input Subject  str1 Event str10 Start_Date str10  End_Date
          
                 Subject      Event  Start_Date    End_Date
            1. 1 A  "12/06/2014" "25/12/2014"
            2. 1 B  "26/08/2014" "03/02/2015"
            3. 1 C  "18/11/2014" "30/04/2015"
            4. 1 D  "08/03/2014" "15/08/2015"
            5. end 
          
          . foreach x in Start End { 
            2.         gen `x'Date = daily(`x'_Date, "DMY") 
            3.         drop `x'_Date 
            4. }
          
          . 
          . gen long obs = _n 
          
          . expand 2 
          (4 observations created)
          
          . bysort obs : gen D = cond(_n == 1, StartDate, EndDate) 
          
          . format *Date D %td 
          
          . by obs : gen C = _n == 2 
          
          . sort Subject D 
          
          . list Subject D Event C 
          
               +---------------------------------+
               | Subject           D   Event   C |
               |---------------------------------|
            1. |       1   08mar2014       D   0 |
            2. |       1   12jun2014       A   0 |
            3. |       1   26aug2014       B   0 |
            4. |       1   18nov2014       C   0 |
            5. |       1   25dec2014       A   1 |
               |---------------------------------|
            6. |       1   03feb2015       B   1 |
            7. |       1   30apr2015       C   1 |
            8. |       1   15aug2015       D   1 |
               +---------------------------------+
          
          . by Subject : gen S = Event if C == 0 & _n == 1 
          (7 missing values generated)
          
          . by Subject : replace S = S[_n-1] + Event if C == 0 & _n > 1 
          variable S was str1 now str4
          (3 real changes made)
          
          . by Subject : replace S = subinstr(S[_n-1], Event, "", .) if C == 1 & _n > 1  
          (3 real changes made)
          
          .  
          . 
          . list 
          
               +----------------------------------------------------------------------+
               | Subject   Event   StartDate     EndDate   obs           D   C      S |
               |----------------------------------------------------------------------|
            1. |       1       D   08mar2014   15aug2015     4   08mar2014   0      D |
            2. |       1       A   12jun2014   25dec2014     1   12jun2014   0     DA |
            3. |       1       B   26aug2014   03feb2015     2   26aug2014   0    DAB |
            4. |       1       C   18nov2014   30apr2015     3   18nov2014   0   DABC |
            5. |       1       A   12jun2014   25dec2014     1   25dec2014   1    DBC |
               |----------------------------------------------------------------------|
            6. |       1       B   26aug2014   03feb2015     2   03feb2015   1     DC |
            7. |       1       C   18nov2014   30apr2015     3   30apr2015   1      D |
            8. |       1       D   08mar2014   15aug2015     4   15aug2015   1        |
               +----------------------------------------------------------------------+

          Comment


          • #6
            Paul and Nick, thank you for your contributions, they are greatly appreciated.

            I am not often motivate by Excel purposes Nick, but in my mind this was the best way to visually display the data after the manipulation.

            Thanks,

            Rob.

            Comment

            Working...
            X