Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • MS Excel's SUMIFS equivalent in STATA

    Hello,
    I'm looking for an equivalent in STATA to MS Excel's SUMIFS function.

    I have a database which includes prices, quantities and market shares of many products which were sold in different regions, months and years.

    For each product I'd like to calculate the total sum of market shares in the other regions (in the same month and year).

    I managed to find the correct formula in MS Excel (by using SUMIFS function) but unfortunately it takes a lot of processing time.

    I thought it would be efficient and more quicker to do it in STATA.

    Is it possible at all?

    Thank you for your help


  • #2
    Several threads on this. Please search the archive. Or Google site:www.statalist.org SUMIFS

    By the way, many of the most active members here make little or no use of MS Excel. So, in a question of the same kind, please explain with examples what the Excel function (or whatever it is) does.

    See also http://www.statalist.org/forums/help#spelling
    Last edited by Nick Cox; 01 May 2016, 08:38.

    Comment


    • #3
      Michael:
      I'm not very clear with what you're after (posting an excerpt of your data via -dataex- or a .dta file can shed some light on your problem).
      That said, I would start off with taking a look at -help summarize- and -help if-.

      PS: Crossed in the cyberspace with Nick's reply.
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Nick, of course I did search the archive before, but didn't find a solution what happens when one of the criterions is "not equal to".
        Sorry for not explaining my problem clearly.
        The function SUMIFS in MS Excel adds all numbers in a range of cells, based on a single or multiple criteria.
        In my example, the formula written in cell F2 summed up all the market shares in column E, based on multiple criteria (product code, year and month should be equal to those in row 2, and the last condition is that the region must be not equal to the region in row 2.

        The calculation in this case is: 0.006+0.006=0.012
        For rows 3-4 it will be: 0.005+0.006=0.011
        Click image for larger version

Name:	Picture2.png
Views:	1
Size:	114.1 KB
ID:	1338433


        Nick, I was trying:





        Code:
        egen sum_ms2=total(marketshare), by (productcode year month)
        but I don't know what should be written to sum up only the market shares of the other regions.

        I also attached a sample of my database if it helps to clarify my problem.

        Thank you

        Attached Files
        Last edited by Michael Dean; 01 May 2016, 10:14.

        Comment


        • #5
          A trick that has sometime proven useful to me in similar circumstances would be to follow your egen command with
          Code:
          replace sum_ms2 = sum_ms2 - marketshare
          to remove the marketshare for the current region from the total for all regions.

          This requires that for any combination of productcode year and month each region appears exactly once, which is not the case in the sample data you provided: the observation for product 1004 in March 2013 in region 3 is duplicated.

          Comment


          • #6
            William is right. In fact the general strategies

            sum of others = sum of all - sum of this

            mean of others = (sum of all - sum of this) / (count of all - 1)

            get you a long way. Much more at

            http://www.stata.com/support/faqs/da...ng-properties/

            http://www.stata-journal.com/article...article=dm0075

            http://www.stata-journal.com/article...article=dm0055

            But please do read http://www.statalist.org/forums/help#stata

            We can't copy and paste an image into Stata, nor is posting .dta files recommended.

            Comment


            • #7
              Thanks Williams,
              Your solution sounds great, I'll try it out.
              That observation shouldn't be duplicated. It's my mistake.

              Thanks a lot

              Comment


              • #8
                Nick, thanks a lot for the help, and I'll definitely read it.

                Comment


                • #9
                  The 2014 paper (second citation) is a bit smarter than the FAQ (first citation), but currently behind a pay wall.

                  Comment


                  • #10
                    To determine if there were duplicates, I used
                    Code:
                    bysort productcode year month region: assert _N==1
                    and when that assertion failed, to find the duplicate observations I used
                    Code:
                    bysort productcode year month region: generate toprint = _N>1
                    list if toprint

                    Comment


                    • #11

                      Code:
                      duplicates list  productcode year month region
                      is another way to start.

                      Comment

                      Working...
                      X