Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • combine observations under conditions

    Hi there,
    I am seeking help to combine rows into one under a certain ID. The conditions are as below:

    1) when gap between end_date and start_date is within 1 - 3 then they will combine into one.
    2) Under the same ID if any of the variables between the rows differ then time gap conditions won't apply and they won't combine

    For example, considering the table below under: Id 1 #, all the variables (i.e., x,y,z) are same. end_date = 13 in the first row and start_date = 16 in the second row are combined and makes it start_date = 10 and end_date = 18.
    Under Id 3 # since the variable z varies (i.e., ccc and asd) between the two rows, thus even though the gap between star_date and end_date is 2 (i.e., end_date = 85, start_date = 87), but it didn't combine

    Dataset:
    Id start_date end_date x y z
    1 10 13 12 10 aaa
    1 16 18 12 10 aaa
    1 25 22 12 10 aaa
    1 24 30 13 14 aaa
    2 11 20 44 50 xxx
    2 21 34 44 50 xxx
    2 40 45 44 50 xxx
    3 78 85 25 31 ccc
    3 87 90 25 31 asd
    3 92 93 25 31 pqs
    3 94 99 29 34 pqs

    Result:
    Id start_date end_date x y z
    1 10 18 12 10 aaa
    1 25 30 13 14 aaa
    2 11 34 44 50 xxx
    2 40 45 44 50 xxx
    3 78 85 25 31 ccc
    3 87 90 25 31 asd
    3 92 99 29 34 pqs
    Really appreciate your help to find out the right code.

    Thank you in advance.

    Kind regards,
    Rakib

  • #2
    Hey Rakib,

    thank you for your question. Please provide a code example on how you get from "Dataset:" to "Result:". That would help solving your problem. Also, the FAQ provides some advice on the usage of data examples on statalist.

    Thank you and best regards
    Sebastian

    Comment


    • #3
      In principle something like this should help. In practice, your example shows an example of end date before start date and (it seems) overlapping episodes, so the example needs to be cleaned up and the rules you give may not be complete.

      Code:
      bysort id (start_date) : gen wanted = sum(start_date > (end_date[_n-1] + 3) | (x != x[_n-1]) | (y != y[_n-1]) | (z != z[_n-1]))

      Comment


      • #4
        Thank you very much for your help. One of the observations is overlapping which is a typing error.

        Comment


        • #5
          I have cleaned the example and tried with the bysort code but don't know how to collapse. Basically I am trying to combine/collapse rows into one under a certain ID. The conditions are as below:

          1) when gap between end_date and start_date is within 1 - 3 then they will combine/collapse into one.
          2) Under the same ID if any of the variables between the rows differ then time gap conditions won't apply and they won't combine/collapse

          I tried in many ways but I am really struggling to imply this logic. Therefore, I am seeking your help to get the code.
          Example:
          Id start_date end_date x y z
          1 10 13 12 10 aaa
          1 13 18 12 10 aaa
          1 20 22 12 10 aaa
          1 24 30 13 14 aaa
          2 11 20 44 50 xxx
          2 21 34 44 50 xxx
          2 40 45 44 50 xxx
          3 78 85 25 31 ccc
          3 87 90 25 31 asd
          3 92 93 29 35 pqs
          3 94 99 29 35 pqs
          Result after coding will be as follows:
          Id start_date end_date x y z
          1 10 22 12 10 aaa
          1 24 30 13 14 aaa
          2 11 34 44 50 xxx
          2 40 45 44 50 xxx
          3 78 85 25 31 ccc
          3 87 90 25 31 asd
          3 92 99 29 35 pqs
          Thank you.

          Comment


          • #6
            Please do as requested in #2 and provide a data example as requested in the FAQ. The example in #5 is helpful but still ambiguous on whether z is a string variable or a numeric variable with value labels.

            Your example and your rules don't seem fully consistent as the first block of observations includes an observation for which the start date is the same as the previous end date. So the rule presumably is that a difference of 0, 1, 2 or 3 is fine.

            Otherwise this is a variation on the code in #3.


            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input byte(id start_date end_date x y) str3 z
            1 10 13 12 10 "aaa"
            1 13 18 12 10 "aaa"
            1 20 22 12 10 "aaa"
            1 24 30 13 14 "aaa"
            2 11 20 44 50 "xxx"
            2 21 34 44 50 "xxx"
            2 40 45 44 50 "xxx"
            3 78 85 25 31 "ccc"
            3 87 90 25 31 "asd"
            3 92 93 29 35 "pqs"
            3 94 99 29 35 "pqs"
            end
            bysort id (start_date) : gen spell = sum((start_date-end_date[_n-1] > 3) | (z != z[_n-1]) | (y != y[_n-1]) | (x != x[_n-1]))
            
            list, sepby(id spell)
            
            collapse (min) start (max) end (mean) x y , by(id spell z)
            
            list, sepby(id)
            Results

            Code:
            . list, sepby(id spell)
            
                 +--------------------------------------------------+
                 | id   start_~e   end_date    x    y     z   spell |
                 |--------------------------------------------------|
              1. |  1         10         13   12   10   aaa       1 |
              2. |  1         13         18   12   10   aaa       1 |
              3. |  1         20         22   12   10   aaa       1 |
                 |--------------------------------------------------|
              4. |  1         24         30   13   14   aaa       2 |
                 |--------------------------------------------------|
              5. |  2         11         20   44   50   xxx       1 |
              6. |  2         21         34   44   50   xxx       1 |
                 |--------------------------------------------------|
              7. |  2         40         45   44   50   xxx       2 |
                 |--------------------------------------------------|
              8. |  3         78         85   25   31   ccc       1 |
                 |--------------------------------------------------|
              9. |  3         87         90   25   31   asd       2 |
                 |--------------------------------------------------|
             10. |  3         92         93   29   35   pqs       3 |
             11. |  3         94         99   29   35   pqs       3 |
                 +--------------------------------------------------+
            
            . .
            . list, sepby(id)
            
                 +--------------------------------------------------+
                 | id     z   spell   start_~e   end_date    x    y |
                 |--------------------------------------------------|
              1. |  1   aaa       1         10         22   12   10 |
              2. |  1   aaa       2         24         30   13   14 |
                 |--------------------------------------------------|
              3. |  2   xxx       1         11         34   44   50 |
              4. |  2   xxx       2         40         45   44   50 |
                 |--------------------------------------------------|
              5. |  3   ccc       1         78         85   25   31 |
              6. |  3   asd       2         87         90   25   31 |
              7. |  3   pqs       3         92         99   29   35 |
                 +--------------------------------------------------+
            Last edited by Nick Cox; 13 Feb 2023, 07:42.

            Comment


            • #7
              After the trial and error process I finally make your suggested code run and it is perfectly ok now. My database is bit messy and I am struggling to understand them. Thanks a lot for helping me again.

              Rakib

              Comment


              • #8
                Hi There


                I am seeking help with combining two rows, I want to add up and get a total for row 1 and 2:

                as_pse_feel |
                _neighbrhd | Freq. Percent Cum.
                ------------+-----------------------------------
                1 | 58 29.00 29.00
                2 | 114 57.00 86.00
                3 | 17 8.50 94.50
                4 | 10 5.00 99.50
                5 | 1 0.50 100.00
                ------------+-----------------------------------
                Total | 200 100.00

                Comment


                • #9
                  Sorry for hijacking your post Rakibul,

                  Comment

                  Working...
                  X