Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Combine multiple rows into one when end_date = start_date

    Hi there,

    I am looking for a code that combine the rows when end_date = start_date for a certain ID when all the variables have same values. For example, in the table below, ID 1 has 4 rows and their variables x and y have the same values in each rows under ID 1. ID 1 has end_date=start_date in each of its row. In that case, ID 1 will be combined into one row that will show the very first start_date and very last end_date.

    id start_date end_date x y
    1 10 12 10 12
    1 12 14 10 12
    1 14 16 10 12
    1 16 21 10 12
    2 11 13 15 17
    2 13 15 15 17
    3 16 18 20 22
    3 18 20 20 22
    3 20 25 20 22



    Result will look like this:

    id start_date end_date x y
    1 10 21 10 12
    2 11 15 15 17
    3 16 25 20 22

    I am seeking for help to get the code.
    Thank you.

    Kind regards,
    Rakib

  • #2
    Code:
     collapse (min) start_date (max) end_date (mean) x y, by(id)
    does what you want for your data example. It does not check for complications you might carry about such as

    1. Gaps or overlaps in time.

    2. Variability in x or y.

    Comment


    • #3
      Thank you very much for your help. It worked perfectly.

      Comment


      • #4
        Hi there,
        The above code considers the min and max for a certain ID. However, my database contains gaps in time. I am looking for a code that also considers the gaps in time. For example, Id 1 has date gaps between 3rd and 4th row. Therefore, the result will combine the first 3 rows (combined into one row that will show the very first start_date and very last end_date.) but will consider 4th row as a separate observation.

        Observations:
        Id start_date end_date x y
        1 10 13 12 10
        1 13 17 12 10
        1 17 22 12 10
        1 23 30 13 14
        2 11 20 44 50
        2 20 34 44 50
        2 35 41 44 50
        3 78 85 25 31
        3 85 90 25 31
        3 90 93 25 31
        3 94 97 29 34

        and the result will be as follows:
        Id start_date end_date x y
        1 10 22 12 10
        1 23 30 13 14
        2 11 34 44 50
        2 35 41 44 50
        3 78 93 25 31
        3 94 97 29 34
        Thank you very much for your help.

        Kind regards,
        Rakib

        Comment


        • #5
          We can define spells by the condition that the start of each spell is NOT equal to the end of the previous spell. This works for the first observation in each spell too, as end_date[0] is returned as missing.

          For much more discussion, see https://www.stata-journal.com/articl...article=dm0029 and also tsspell from SSC (not needed here).

          Thus

          Code:
          clear 
          input Id    start_date    end_date    x    y
          1    10    13    12    10
          1    13    17    12    10
          1    17    22    12    10
          1    23    30    13    14
          2    11    20    44    50
          2    20    34    44    50
          2    35    41    44    50
          3    78    85    25    31
          3    85    90    25    31
          3    90    93    25    31
          3    94    97    29    34
          end 
          
          bysort Id (start_date) : gen spell = sum(start_date != end_date[_n-1])
          
          list , sepby(Id spell)
          
          collapse (min) start_date (max) end_date (mean) x y, by(Id spell)
          
          list, sepby(Id)
          Results:

          Code:
          . list , sepby(Id spell)
          
               +--------------------------------------------+
               | Id   start_~e   end_date    x    y   spell |
               |--------------------------------------------|
            1. |  1         10         13   12   10       1 |
            2. |  1         13         17   12   10       1 |
            3. |  1         17         22   12   10       1 |
               |--------------------------------------------|
            4. |  1         23         30   13   14       2 |
               |--------------------------------------------|
            5. |  2         11         20   44   50       1 |
            6. |  2         20         34   44   50       1 |
               |--------------------------------------------|
            7. |  2         35         41   44   50       2 |
               |--------------------------------------------|
            8. |  3         78         85   25   31       1 |
            9. |  3         85         90   25   31       1 |
           10. |  3         90         93   25   31       1 |
               |--------------------------------------------|
           11. |  3         94         97   29   34       2 |
               +--------------------------------------------+
          
          . 
          
          . list, sepby(Id)
          
               +--------------------------------------------+
               | Id   spell   start_~e   end_date    x    y |
               |--------------------------------------------|
            1. |  1       1         10         22   12   10 |
            2. |  1       2         23         30   13   14 |
               |--------------------------------------------|
            3. |  2       1         11         34   44   50 |
            4. |  2       2         35         41   44   50 |
               |--------------------------------------------|
            5. |  3       1         78         93   25   31 |
            6. |  3       2         94         97   29   34 |
               +--------------------------------------------+

          Comment


          • #6
            Thank you very much. It is such a great help. Much appreciated.

            Comment


            • #7
              Hi,
              One last question regarding this. What if there is a string variable? (Mean) does not work for string variable. I tried to destring it but it turns out missing value.
              Thank you.

              Comment


              • #8
                I think #7 is referring to collapse. I think you just need to put a string variable in the by() option.

                Comment


                • #9
                  Thank you very much for your advice.

                  Comment

                  Working...
                  X