Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • expand rows to fill dates

    Dear StataListers,

    I have a longitudinal dataset with observational data, please see the photos which are clearer than any explanation undoubtedly.

    I would like to add rows when necessary in order to get chronological data.
    Basically there are 2 objectives:
    (1) Every start date should be identical to the previous end date, so that there is no gap in term of time.
    (2) The state and house variables are time-dependent and should change accordingly.

    Also please note that the individuals in the dataset can have more than only 2 lines.

    I believe a forval or foreach command can do the trick, but I am not yet an expert in macros to find a solution.
    Any help would be appreciated, thanks!
    This is the dataset I have This is what I would like to do

  • #2
    No loop or macros (in Stata's sense, or any other program's) needed here. Try

    Code:
     
    bysort id : gen toexpand = 1 + (end != start[_n+1]) if _n < _N 
    expand toexpand 
    bysort id (end) : replace start = end[_n-1] if start == start[_n-1]   
    by id: replace end = start[_n+1] if end == end[_n-1] 
    edit

    Comment


    • #3
      Thank you for your reply. It really works well for my simple example. And I thought I’d be able to improve it the way I needed it to be for more complex observations, but I am struggling a bit.

      Actually, not all the observations need the “expand” command. Even though I realize this what I asked for in the title of my question...
      And that is also why I imagined a loop would be efficient.

      Below I've listed two other examples to describe the different cases I get.








      Thanks for your help.
      Last edited by Carole Khairallah; 13 Jun 2014, 05:58.

      Comment


      • #4
        Only those observations are expanded that need to be.

        You're welcome to try writing a loop; my not very humble guess is that writing it will be harder work and the code will be slower to execute, so I can see no sense in which such code will be "efficient".

        Unfortunately we can't comment on your changed code that you don't show us.

        Photos of data listings aren't very readable and they don't allow copy and paste for anyone to play with examples. Small data listings are best displayed as if they were code. Please see Advice in FAQ, Section 12.

        Comment


        • #5
          Thanks for your advice.

          Below I've re-copied one of the examples, with on the left the how the data are, and on the right how it should be.

          Code:
           
          id state house start end id state house start end
          abc 1-630 15-Dec-02 15-Jun-03 abc 1-630 10-7 15-Dec-02 15-Jun-03
          abc 10-11 15-Jun-03 15-Apr-11 abc 10-11 10-7 15-Jun-03 15-Apr-11
          abc 10-7 15-Jun-03 19-Mar-12 ==> abc 6-85 10-7 15-Apr-11 19-Mar-12
          abc 6-85 15-Apr-11 15-Sep-12 abc 6-85 10-8 19-Mar-12 15-Sep-12
          abc 10-8 19-Mar-12 15-Sep-13 abc 10-34 10-8 15-Sep-12 15-Sep-13
          abc 10-34 15-Sep-12 abc 10-34 10-11 15-Sep-13
          abc 10-11 15-Sep-13
          The result after expand will look like below

          Code:
           
          id state house start end toexpand
          abc 1-630 15-Dec-02 15-Jun-03 1
          abc 10-11 15-Jun-03 15-Apr-11 2
          abc 10-11 15-Jun-03 15-Apr-11 2
          abc 10-7 15-Jun-03 19-Mar-12 2
          abc 10-7 15-Jun-03 19-Mar-12 2
          abc 6-85 15-Apr-11 15-Sep-12 2
          abc 6-85 15-Apr-11 15-Sep-12 2
          abc 10-8 19-Mar-12 15-Sep-13 2
          abc 10-8 19-Mar-12 15-Sep-13 2
          abc 10-34 15-Sep-12 2
          abc 10-34 15-Sep-12 2
          abc 10-11 15-Sep-13
          I highlighted in red the first value which will change after the command but which should not change
          And in blue the date that should become "15-Apr-11".
          The value in red should be "retained" somehow, so that the next line is changed.

          Code:
            by individid : replace sdate = edate[_n-1] if sdate == sdate[_n-1]
          Sorry if I am not very clear, I hope the tables tell more.
          Thanks

          Comment


          • #6
            Your code looks right to me. You may have other kinds of overlap too.

            Comment

            Working...
            X