Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to get sum of category

    Hi everyone,

    I use id and type to generate dummy_1~dummy_5
    My data looks like this:
    id type dummy_1 dummy_2 dummy_3 dummy_4 dummy_5
    1 2 0 1 0 0 0
    1 4 0 0 0 1 0
    2 1 1 0 0 0 0
    2 3 0 0 1 0 0
    2 5 0 0 0 0 1
    3 2 0 1 0 0 0
    4 5 0 0 0 0 1
    But I want to have this:
    id dummy_1 dummy_2 dummy_3 dummy_4 dummy_5
    1 0 1 0 1 0
    2 1 0 1 0 1
    3 0 1 0 0 0
    4 0 0 0 0 1
    As a result, my command is like this:
    forvalues i=1(1)5{
    bysort id: replace dummy_`i'=sum(dummy_`i'),
    }
    I thought I can use this command first, and delete the same data by using "duplicates drop id type, force" to get what I wanted.

    However, after use the command:
    forvalues i=1(1)5{
    bysort id: replace dummy_`i'=sum(dummy_`i'),
    }
    I get this:
    id type dummy_1 dummy_2 dummy_3 dummy_4 dummy_5
    1 2 0 1 0 0 0
    1 4 0 1 0 1 0
    2 1 1 0 0 0 0
    2 3 1 0 1 0 0
    2 5 1 0 1 0 1
    3 2 0 1 0 0 0
    4 5 0 0 0 0 1
    I'm confusing about it now. Can anyone tell me how to do it?

    Thanks for your help
    Li
    Last edited by Chia Jung Li; 04 Dec 2021, 02:28.

  • #2
    Code:
    collapse (sum) dummy*, by(id)

    Comment


    • #3
      Originally posted by Øyvind Snilsberg View Post
      Code:
      collapse (sum) dummy*, by(id)
      Thanks for answering.

      Did you mean that I could use command like this:
      forvalues i=1(1)5{
      replace dummy_`i'=collapse(sum) dummy_`i', by (id)
      }

      After I used that, I got :
      unknown function collapse()
      r(133);

      I think I misunderstood it, what should I do?

      Comment


      • #4
        use the command in #2 on the "original" data,
        Originally posted by Chia Jung Li View Post
        id type dummy_1 dummy_2 dummy_3 dummy_4 dummy_5
        1 2 0 1 0 0 0
        1 4 0 0 0 1 0
        2 1 1 0 0 0 0
        2 3 0 0 1 0 0
        2 5 0 0 0 0 1
        3 2 0 1 0 0 0
        4 5 0 0 0 0 1

        Comment


        • #5
          Originally posted by Øyvind Snilsberg View Post
          use the command in #2 on the "original" data,

          I tried the command in #2 with original data, the command would be like this:
          forvalues i=1(1)5{
          collapse (sum) dummy_`i', by(id),
          }

          I get:
          id dummy_1
          1 0
          2 1
          3 0
          4 0

          And the error type:
          variable dummy_2 not found
          r(111);

          p.s. I try to use loop, because I want to use this code with big data later...but I fail to do it with small data....
          Last edited by Chia Jung Li; 04 Dec 2021, 03:48.

          Comment


          • #6
            #2 is the entire answer.


            WIth the approach in #5, once you

            Code:
            collapse (sum) dummy_1, by(id)
            the dataset contains only dummy_1 and id.

            So that approach is wrong.

            Also watch out: the function sum() gives you running or cumulative sums, not groupwise totals. But collapse (sum) does give you totals.

            Comment


            • #7
              @Øyvind Snilsberg's method means applying collapse directly to your data:

              Code:
              clear
              input float(id type dummy_1 dummy_2 dummy_3 dummy_4 dummy_5)
              1 2 0 1 0 0 0
              1 4 0 0 0 1 0
              2 1 1 0 0 0 0
              2 3 0 0 1 0 0
              2 5 0 0 0 0 1
              3 2 0 1 0 0 0
              4 5 0 0 0 0 1
              end
              
              collapse (sum) dummy*, by(id)

              Comment


              • #8
                Thanks for all of you Øyvind Snilsberg Chen Samulsion Nick Cox
                You explain clearly
                I thought I had to use code "forvalues"
                Now I know how to do it!!
                Again, thanks a lot!

                Comment

                Working...
                X