MS Excel's SUMIFS equivalent in STATA

Michael Dean

Join Date: Mar 2016

Posts: 7
#1

MS Excel's SUMIFS equivalent in STATA

01 May 2016, 08:32

Hello,
I'm looking for an equivalent in STATA to MS Excel's SUMIFS function.

I have a database which includes prices, quantities and market shares of many products which were sold in different regions, months and years.

For each product I'd like to calculate the total sum of market shares in the other regions (in the same month and year).
I managed to find the correct formula in MS Excel (by using SUMIFS function) but unfortunately it takes a lot of processing time.

I thought it would be efficient and more quicker to do it in STATA.
Is it possible at all?

Thank you for your help
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35755
#2

01 May 2016, 08:35

Several threads on this. Please search the archive. Or Google site:www.statalist.org SUMIFS

By the way, many of the most active members here make little or no use of MS Excel. So, in a question of the same kind, please explain with examples what the Excel function (or whatever it is) does.

See also http://www.statalist.org/forums/help#spelling

Last edited by Nick Cox; 01 May 2016, 08:38.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17726
#3

01 May 2016, 08:35

Michael:
I'm not very clear with what you're after (posting an excerpt of your data via -dataex- or a .dta file can shed some light on your problem).
That said, I would start off with taking a look at -help summarize- and -help if-.

PS: Crossed in the cyberspace with Nick's reply.

Kind regards,
Carlo
(Stata 19.0)
Comment
Michael Dean

Join Date: Mar 2016

Posts: 7
#4

01 May 2016, 10:03

Nick, of course I did search the archive before, but didn't find a solution what happens when one of the criterions is "not equal to".
Sorry for not explaining my problem clearly.
The function SUMIFS in MS Excel adds all numbers in a range of cells, based on a single or multiple criteria.

In my example, the formula written in cell F2 summed up all the market shares in column E, based on multiple criteria (product code, year and month should be equal to those in row 2, and the last condition is that the region must be not equal to the region in row 2.

The calculation in this case is: 0.006+0.006=0.012
For rows 3-4 it will be: 0.005+0.006=0.011

Nick, I was trying:

Code:

egen sum_ms2=total(marketshare), by (productcode year month)

but I don't know what should be written to sum up only the market shares of the other regions.

I also attached a sample of my database if it helps to clarify my problem.

Thank you

Attached Files

marketshares.dta (2.0 KB, 1 view)

Last edited by Michael Dean; 01 May 2016, 10:14.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#5

01 May 2016, 10:41

A trick that has sometime proven useful to me in similar circumstances would be to follow your egen command with

Code:

replace sum_ms2 = sum_ms2 - marketshare

to remove the marketshare for the current region from the total for all regions.

This requires that for any combination of productcode year and month each region appears exactly once, which is not the case in the sample data you provided: the observation for product 1004 in March 2013 in region 3 is duplicated.
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35755
#6

01 May 2016, 11:24

William is right. In fact the general strategies

sum of others = sum of all - sum of this

mean of others = (sum of all - sum of this) / (count of all - 1)

get you a long way. Much more at

http://www.stata.com/support/faqs/da...ng-properties/

http://www.stata-journal.com/article...article=dm0075

http://www.stata-journal.com/article...article=dm0055

But please do read http://www.statalist.org/forums/help#stata

We can't copy and paste an image into Stata, nor is posting .dta files recommended.
Comment
Michael Dean

Join Date: Mar 2016

Posts: 7
#7

01 May 2016, 11:26

Thanks Williams,
Your solution sounds great, I'll try it out.
That observation shouldn't be duplicated. It's my mistake.

Thanks a lot
Comment
Michael Dean

Join Date: Mar 2016

Posts: 7
#8

01 May 2016, 11:31

Nick, thanks a lot for the help, and I'll definitely read it.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35755
#9

01 May 2016, 11:47

The 2014 paper (second citation) is a bit smarter than the FAQ (first citation), but currently behind a pay wall.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#10

01 May 2016, 11:50

To determine if there were duplicates, I used

Code:

bysort productcode year month region: assert _N==1

and when that assertion failed, to find the duplicate observations I used

Code:

bysort productcode year month region: generate toprint = _N>1 list if toprint
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35755
#11

01 May 2016, 11:54

Code:

duplicates list productcode year month region

is another way to start.
1 like
Comment

Announcement

MS Excel's SUMIFS equivalent in STATA

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment