Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sum and Total

    Dear Stata users,

    I have this data example that consist of firm-level data. My first step is to calulate the var "dem" (demand).


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int year str4 comp byte sec int total_a float perc_a long cost
    2009 "A125" 1 4652 1.163 2500549
    2009 "A125" 1 4787 1.197 2501240
    2009 "A125" 1 4922 1.231 2501931
    2009 "A125" 1 5057 1.264 2502622
    2009 "A125" 1 5192 1.298 2503313
    2009 "A125" 1 5327 1.332 2504004
    2009 "A125" 1 5462 1.366 2504695
    2009 "A125" 1 5597 1.399 2505386
    2009 "A125" 1 5732 1.433 2506077
    2009 "A125" 1 5867 1.467 2506768
    2010 "A125" 1 6002 1.501 2507459
    2010 "A125" 1 6137 1.534 2508150
    2010 "A125" 1 6272 1.568 2508841
    2010 "A125" 1 6407 1.602 2509532
    2010 "A125" 1 6542 1.636 2510223
    2010 "A125" 1 6677 1.669 2510914
    2010 "A125" 1 6812 1.703 2511605
    2010 "A125" 1 6947 1.737 2512296
    2010 "A125" 1 7082 1.771 2512987
    2010 "A125" 1 7217 1.804 2513678
    2011 "A125" 1 7352 1.838 2514369
    2011 "A125" 1 7487 1.872 2515060
    2011 "A125" 1 7622 1.906 2515751
    2011 "A125" 1 7757 1.939 2516442
    2011 "A125" 1 7892 1.973 2517133
    2011 "A125" 1 8027 2.007 2517824
    2011 "A125" 1 8162 2.041 2518515
    2011 "A125" 1 8297 2.074 2519206
    2011 "A125" 1 8432 2.108 2519897
    2011 "A125" 1 8567 2.142 2520588
    end

    I use this code
    Code:
    gen dem=perc_a * cost
    to obtain this data



    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int year str4 comp byte sec int total_a float perc_a long cost float dem
    2009 "A125" 1 4652 1.163 2500549 2908138.5
    2009 "A125" 1 4787 1.197 2501240   2993984
    2009 "A125" 1 4922 1.231 2501931   3079877
    2009 "A125" 1 5057 1.264 2502622   3163314
    2009 "A125" 1 5192 1.298 2503313   3249300
    2009 "A125" 1 5327 1.332 2504004   3335333
    2009 "A125" 1 5462 1.366 2504695 3421413.5
    2009 "A125" 1 5597 1.399 2505386   3505035
    2009 "A125" 1 5732 1.433 2506077   3591208
    2009 "A125" 1 5867 1.467 2506768   3677429
    2010 "A125" 1 6002 1.501 2507459   3763696
    2010 "A125" 1 6137 1.534 2508150   3847502
    2010 "A125" 1 6272 1.568 2508841 3933862.5
    2010 "A125" 1 6407 1.602 2509532   4020270
    2010 "A125" 1 6542 1.636 2510223   4106725
    2010 "A125" 1 6677 1.669 2510914 4190715.5
    2010 "A125" 1 6812 1.703 2511605   4277263
    2010 "A125" 1 6947 1.737 2512296   4363858
    2010 "A125" 1 7082 1.771 2512987   4450500
    2010 "A125" 1 7217 1.804 2513678   4534675
    2011 "A125" 1 7352 1.838 2514369 4621410.5
    2011 "A125" 1 7487 1.872 2515060 4708192.5
    2011 "A125" 1 7622 1.906 2515751 4795021.5
    2011 "A125" 1 7757 1.939 2516442   4879381
    2011 "A125" 1 7892 1.973 2517133 4966303.5
    2011 "A125" 1 8027 2.007 2517824   5053273
    2011 "A125" 1 8162 2.041 2518515   5140289
    2011 "A125" 1 8297 2.074 2519206   5224833
    2011 "A125" 1 8432 2.108 2519897   5311943
    2011 "A125" 1 8567 2.142 2520588   5399100
    end
    In this example I pasted only one firm, but in fact I have about 1250 firms. I need to generate another variable: tot_dem (total demand) by summing over the sec variable.
    My question is if I want to do this which command/code is right? The "sum" or the "total". Both of them give me the same result. I checked the description and doesnt show that sum and total commands are the same.
    Below is the code and the final result:

    Code:
    bysort sec year: egen tot_dem1= total(dem)
    bysort sec year: egen tot_dem2= sum(dem)


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int year str4 comp byte sec int total_a float perc_a long cost float(dem tot_dem1 tot_dem2)
    2009 "A125" 1 4652 1.163 2500549 2908138.5 32925034 32925034
    2009 "A125" 1 4787 1.197 2501240   2993984 32925034 32925034
    2009 "A125" 1 4922 1.231 2501931   3079877 32925034 32925034
    2009 "A125" 1 5057 1.264 2502622   3163314 32925034 32925034
    2009 "A125" 1 5192 1.298 2503313   3249300 32925034 32925034
    2009 "A125" 1 5327 1.332 2504004   3335333 32925034 32925034
    2009 "A125" 1 5462 1.366 2504695 3421413.5 32925034 32925034
    2009 "A125" 1 5597 1.399 2505386   3505035 32925034 32925034
    2009 "A125" 1 5732 1.433 2506077   3591208 32925034 32925034
    2009 "A125" 1 5867 1.467 2506768   3677429 32925034 32925034
    2010 "A125" 1 6002 1.501 2507459   3763696 41489068 41489068
    2010 "A125" 1 6137 1.534 2508150   3847502 41489068 41489068
    2010 "A125" 1 6272 1.568 2508841 3933862.5 41489068 41489068
    2010 "A125" 1 6407 1.602 2509532   4020270 41489068 41489068
    2010 "A125" 1 6542 1.636 2510223   4106725 41489068 41489068
    2010 "A125" 1 6677 1.669 2510914 4190715.5 41489068 41489068
    2010 "A125" 1 6812 1.703 2511605   4277263 41489068 41489068
    2010 "A125" 1 6947 1.737 2512296   4363858 41489068 41489068
    2010 "A125" 1 7082 1.771 2512987   4450500 41489068 41489068
    2010 "A125" 1 7217 1.804 2513678   4534675 41489068 41489068
    2011 "A125" 1 7352 1.838 2514369 4621410.5 50099744 50099744
    2011 "A125" 1 7487 1.872 2515060 4708192.5 50099744 50099744
    2011 "A125" 1 7622 1.906 2515751 4795021.5 50099744 50099744
    2011 "A125" 1 7757 1.939 2516442   4879381 50099744 50099744
    2011 "A125" 1 7892 1.973 2517133 4966303.5 50099744 50099744
    2011 "A125" 1 8027 2.007 2517824   5053273 50099744 50099744
    2011 "A125" 1 8162 2.041 2518515   5140289 50099744 50099744
    2011 "A125" 1 8297 2.074 2519206   5224833 50099744 50099744
    2011 "A125" 1 8432 2.108 2519897   5311943 50099744 50099744
    2011 "A125" 1 8567 2.142 2520588   5399100 50099744 50099744
    end


    then
    Thanks
    JLi






  • #2
    Good question. Before Stata 9 the egen function in question was called sum(). Why was the name changed?

    In 2004 Svend Juul gave a very witty talk in Berlin pointing out -- among other things that were a bit dopey --- that


    Code:
    gen ... = sum()
    gave cumulative or running sums but

    Code:
    egen ...  = sum()
    gave totals, which are only occasionally going to be the same results. Here are the slides: https://www.stata.com/meeting/2german/Juul.pdf

    So StataCorp renamed the egen function total(), which is what is now documented. But the same code remains in existence as the sum() function of egen, StataCorp did not want to break your code because they changed the function name. Because it atill worked, many people kept using sum() if they were accustomed to it -- and many people kept writing about it too, which is almost certainly how you heard about it.

    Some of these naming difficulties can be traced to constraints on filename lengths under the MS-DOS operating system. Filenames could not be longer than an 8.3 pattern of the form filename.ext Now, each egen function is implemented as a file following the pattern _gX.ado where X could be at most 6 characters long while MS-DOS was still in use. .

    Naturally you could say that the
    sum() function -- not the egen function sum() -- could have been (should have been) called (say) cusum(), but it wasn't, and StataCorp weren't going to change that too. Even cumsum() could have been used, but that is not a good idea on other grounds.

    Oddly, or otherwise, Mata went with
    runningsum().

    "I am not making this up, you know" -- as a then famous raconteur used to say about the plots of Wagner's operas.
    Last edited by Nick Cox; 27 May 2022, 06:28.

    Comment


    • #3
      Thank you so much for the explanation!

      Comment


      • #4
        One more question please: min and max did not work with me. But I read an answer to a question about stata egen for min and max and the problem was solved (https://stackoverflow.com/questions/...n-max-in-stata).
        I generated a variable that is equal to a min of a variable (X). I attached the equation below.

        Does this stata code fit here given the equation below?
        Code:
        bysort comp year: egen fc_output=min(X)
        Thank you,
        JLi
        Click image for larger version

Name:	fc60210.png
Views:	2
Size:	5.1 KB
ID:	1666546
        Attached Files

        Comment

        Working...
        X