Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • collapse observation into a single row

    Dear Stata users,

    I have the following dataset

    input id num1 num2 num3 num4 num5
    1 12 1 14 55 15
    1 12 1 14 145 15
    2 99 9 97 4 96
    2 99 9 97 23 96
    2 99 9 97 200 96
    3 55 3 4 3 22
    end

    I noted that the dataset contains some duplicate observations in terms of all variables but num4.
    Therefore, I would like to collapse these duplicate observations into a single row by summing up the value of num4.

    To illustrate the final dataset should consists of three rows as the following:

    input id num1 num2 num3 num4 num5
    1 12 1 14 200 15
    2 99 9 97 227 96
    3 55 3 4 3 22
    end

    I'm not sure what is the quickest way to do this. Can I kindly ask you to help me with this?
    Thank you in advance.
    Best regards,
    F

  • #2
    sounds like you want -collapse-; use means for variables that do not differ and sum for num4

    Comment


    • #3

      Code:
      .  collapse (sum) num4 , by(id-num3 num5)
      .  list, clean
             id   num1   num2   num3   num5   num4 
        1.    1     12      1     14     15    200 
        2.    2     99      9     97     96    227 
        3.    3     55      3      4     22      3

      Comment


      • #4
        Hello Rich,
        Thank you for your replay.

        I tried your suggestion and it worked. I used the following: collapse num1 num2 num3 (sum) num4 (mean) num5, by(id).

        However, what if I have a string variable in the dataset instead of having all numeric ones?
        Let's consider this dataset:

        input id name num2 num3 num4 num5
        1 a 1 14 55 15
        1 a 1 14 145 15
        2 b 9 97 4 96
        2 b 9 97 23 96
        2 b 9 97 200 96
        3 b 3 4 3 22
        end

        How can I merge duplicates into a single row? It appears that collapse does not work.

        Thank you for your help. Really appreciated.
        F

        Comment


        • #5
          I tried the code Svend suggested. The by option solves my problem.
          Thank you very much. Really useful and prompt reply.

          Thank you and have a lovely day,
          F

          Comment


          • #6
            Note that your input command would not work. But here we go:

            Code:
             input id str1 name num2 num3 num4 num5
             1 "a" 1 14 55 15
             1 "a" 1 14 145 15
             2 "b" 9 97 4 96
             2 "b" 9 97 23 96
             2 "b" 9 97 200 96
             3 "b" 3 4 3 22
             end
            
             collapse (sum) num4 , by(id-num3 num5)
            .  list, clean
                   id   name   num2   num3   num5   num4 
              1.    1      a      1     14     15    200 
              2.    2      b      9     97     96    227 
              3.    3      b      3      4     22      3

            Comment

            Working...
            X