Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Summing rows for each ID

    I have the following data:
    ID Line_Num Revenue
    1 300 5000
    1 2500 10000
    2 300 6000
    2 2500 7000
    For each ID, I would like to add Revenue for Line_Num 300 and 2500 together. Basically, I am trying to create another variable with added values for each ID for the two line numbers. Please suggest how can I do that?
    Thank you!

  • #2
    Well, if you want to reduce the data to a data set with one observation per ID which contains the total Revenue then it's
    Code:
    collapse (sum) Revenue, by(ID)
    If you want to add a new variable to the existing data set which contains the total Revenue for every observation for that ID it would be:
    Code:
    by ID, sort: egen Total_Revenue = total(Revenue)
    The HTML table in which you have shown your data is great for human eyes. But had it been necessary to work with your example data and try things out, a lot of time would be wasted trying to wrestle it into Stata. Please read the FAQ, with emphasis on #12. Do install the -dataex- command (-ssc install dataex-) and in the future use it consistently to show data examples. This will make it easier for people to help you with more complicated questions.

    Both -collapse- and -egen- are basic data management commands in Stata. You will need to become proficient in using them in order to make effective use of Stata over time. Do read these manual sections closely. In addition, I suggest you invest some time reading the Gettings Started [GS} and User's Guide [U] sections of the manual, so that you will be acquainted with all of the most important Stata commands that all regular Stata users need to know. Don't expect to learn the details of each such command, but at least when faced with fundamental data management and analysis issues you will know which commands might be suitable, and then you can approach your problem by starting with the manual sections and help files for those commands, to review the details of the syntax and how they are applied.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      Well, if you want to reduce the data to a data set with one observation per ID which contains the total Revenue then it's
      Code:
      collapse (sum) Revenue, by(ID)
      If you want to add a new variable to the existing data set which contains the total Revenue for every observation for that ID it would be:
      Code:
      by ID, sort: egen Total_Revenue = total(Revenue)
      The HTML table in which you have shown your data is great for human eyes. But had it been necessary to work with your example data and try things out, a lot of time would be wasted trying to wrestle it into Stata. Please read the FAQ, with emphasis on #12. Do install the -dataex- command (-ssc install dataex-) and in the future use it consistently to show data examples. This will make it easier for people to help you with more complicated questions.

      Both -collapse- and -egen- are basic data management commands in Stata. You will need to become proficient in using them in order to make effective use of Stata over time. Do read these manual sections closely. In addition, I suggest you invest some time reading the Gettings Started [GS} and User's Guide [U] sections of the manual, so that you will be acquainted with all of the most important Stata commands that all regular Stata users need to know. Don't expect to learn the details of each such command, but at least when faced with fundamental data management and analysis issues you will know which commands might be suitable, and then you can approach your problem by starting with the manual sections and help files for those commands, to review the details of the syntax and how they are applied.
      Thank you Clyde! That worked. I have installed dataex in my Stata 12 using -ssc install dataex- command, but not sure how to go forward from there. I mean how do I use it within Statalist?

      Comment


      • #4
        Originally posted by Soumya Upadhyay View Post
        I have the following data:
        ID Line_Num Revenue
        1 300 5000
        1 2500 10000
        2 300 6000
        2 2500 7000
        For each ID, I would like to add Revenue for Line_Num 300 and 2500 together. Basically, I am trying to create another variable with added values for each ID for the two line numbers. Please suggest how can I do that?
        Thank you!
        I had a similar question , so I thought of including in the same post. If the above data looks something like below, I am trying to get two different variables that calculate two different ratios. The first new variable will divide Num/Den and named as ratio. The second new variable will divide Num/Days and named as rate. I want to do this for each ID. Thank you in advance!
        ID Measure_name Score
        1 Num 9
        1 Den 7
        1 Days 1000
        2 Num 13
        2 Den 10
        2 Days 2000

        Comment


        • #5
          You need a data structure fit for purpose. Consider

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input byte id str4 measure_name int score
          1 "Num"     9
          1 "Den"     7
          1 "Days" 1000
          2 "Num"    13
          2 "Den"    10
          2 "Days" 2000
          end
          
          reshape wide score, i(id) j(measure_name) string
          rename (score*) (*)
          gen ratio = Num/Den
          
          list 
          
               +----------------------------------+
               | id   Days   Den   Num      ratio |
               |----------------------------------|
            1. |  1   1000     7     9   1.285714 |
            2. |  2   2000    10    13        1.3 |
               +----------------------------------+

          Comment


          • #6
            First, how to use -dataex-. You launch Stata and open your data set. If you don't want to post the whole thing, you use -keep- and -drop- to get down to the variables and observations that you want to post. Then you type -dataex- (without the hyphens before and after) in the command window. Stata will respond with output in the Results window. The first line of the output will be:
            Code:
            ----------------------- copy starting from the next line -----------------------
            and the last line will be
            Code:
            ------------------ copy up to and including the previous line ------------------
            So follow those instructions: highlight everything between (but not including) those two lines. Then paste it directly here in the forum editor. Done. It couldn't be simpler!

            Concerning your new question, the first problem is that the data is in an awkward layout. So you need to -reshape- it to create separate variables for Num, Den, and Days and reduce your data to one observation per id. Some cleanup of the variable names is neded. Then you just calculate the ratio and rate in the usual way.
            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input byte id str4 measure_name int score
            1 "Num"     9
            1 "Den"     7
            1 "Days" 1000
            2 "Num"    13
            2 "Den"    10
            2 "Days" 2000
            end
            
            reshape wide score, i(id) j(measure_name) string
            rename score* *
            rename *, lower
            gen ratio = num/den
            gen rate = num/days
            It seems you are new to Stata, so it is good that you are seeking advice. To get acquainted with the basics of Stata data management and analysis more efficiently than having to post a question every time and wait for somebody to respond, you should read the Getting Started [GS] and User's Guide [U] sections of the online documentation that come with your installation of Stata. These sections cover the commands that are used day-in and day-out to work in Stata. It will take a while to read this, and you won't be able to absorb and retain every detail, but the time will be well spent and quickly repaid by improvements in efficiency because in most situations you will recognize which command(s) are relevant. And then you can get the syntax and other details by looking in the -help files- or in the manual sections dedicated to those commands.

            Added: Crossed with Nick's response, which gives much the same solution.

            Comment


            • #7
              Originally posted by Clyde Schechter View Post
              First, how to use -dataex-. You launch Stata and open your data set. If you don't want to post the whole thing, you use -keep- and -drop- to get down to the variables and observations that you want to post. Then you type -dataex- (without the hyphens before and after) in the command window. Stata will respond with output in the Results window. The first line of the output will be:
              Code:
              ----------------------- copy starting from the next line -----------------------
              and the last line will be
              Code:
              ------------------ copy up to and including the previous line ------------------
              So follow those instructions: highlight everything between (but not including) those two lines. Then paste it directly here in the forum editor. Done. It couldn't be simpler!

              Concerning your new question, the first problem is that the data is in an awkward layout. So you need to -reshape- it to create separate variables for Num, Den, and Days and reduce your data to one observation per id. Some cleanup of the variable names is neded. Then you just calculate the ratio and rate in the usual way.
              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input byte id str4 measure_name int score
              1 "Num" 9
              1 "Den" 7
              1 "Days" 1000
              2 "Num" 13
              2 "Den" 10
              2 "Days" 2000
              end
              
              reshape wide score, i(id) j(measure_name) string
              rename score* *
              rename *, lower
              gen ratio = num/den
              gen rate = num/days
              It seems you are new to Stata, so it is good that you are seeking advice. To get acquainted with the basics of Stata data management and analysis more efficiently than having to post a question every time and wait for somebody to respond, you should read the Getting Started [GS] and User's Guide [U] sections of the online documentation that come with your installation of Stata. These sections cover the commands that are used day-in and day-out to work in Stata. It will take a while to read this, and you won't be able to absorb and retain every detail, but the time will be well spent and quickly repaid by improvements in efficiency because in most situations you will recognize which command(s) are relevant. And then you can get the syntax and other details by looking in the -help files- or in the manual sections dedicated to those commands.

              Added: Crossed with Nick's response, which gives much the same solution.
              Got it! Thanks so much!!

              Comment

              Working...
              X