Calculating shares in Mata

Randy Chugh

Join Date: Jun 2025

Posts: 15
#1

Calculating shares in Mata

26 Jun 2025, 07:25

Hello,

I would like to calculate shares in Mata. This is easy in Stata, but hard in Mata, and wondering if there is some trick or method that I am overlooking.

I have a set of, say, 3 stores that compete in 2 towns (not all stores compete in all towns). The sales data looks like this:
townid storeid sales

17 1 100

17 2 200

17 3 600

28 1 400

28 2 800

In Stata, I would type "bysort townid: egen denominator=sum(sales); gen share=sales/denominator", but there's nothing quite like egen in Mata. The closest I can come to this is "panelsum()". I can write "info=panelsetup(townid,1); panelsum(sales,info)". But then I'm left with the following data:
sales

900

1200

In order to use this result as a denominator, I need to expand this result back to being a 5x1 vector, so that it looks like this:
denominator

900

900

900

1200

1200

Any thoughts on how to achieve this last step?

Thanks!
Randy

Last edited by Randy Chugh; 26 Jun 2025, 07:27.
Tags: None

John Mullahy

Join Date: Dec 2016
Posts: 751

26 Jun 2025, 08:40

Here's a somewhat inelegant approach. The fourth column of zaug contains the shares.

Code:

mata

z=(17,1,100)\(17,2,200)\(17,3,600)\(28,1,400)\(28,2,800)

tid=uniqrows(z[.,1])
zaug=J(0,4,.)
for (j=1;j<=rows(tid);j++) {
 ztemp=select(z,z[.,1]:==tid[j])
 zaug=zaug\(ztemp,ztemp[.,3]:/sum(ztemp[.,3]))
}

zaug

end

Result:

Code:

:
: zaug
                 1             2             3             4
    +---------------------------------------------------------+
  1 |           17             1           100   .1111111111  |
  2 |           17             2           200   .2222222222  |
  3 |           17             3           600   .6666666667  |
  4 |           28             1           400   .3333333333  |
  5 |           28             2           800   .6666666667  |
    +---------------------------------------------------------+

Comment

Randy Chugh

Join Date: Jun 2025

Posts: 15
#3

26 Jun 2025, 08:50

Thanks, John. This will work, but I'm trying to avoid for loops due to speed concerns (imagine there are thousands of towns and thousands of stores). Note that this would be simple to do if "townid" took values of 1 and 2 instead of 17 and 24. In that case, I would just preserve the townid vector and then do the following "denominator=panelsum(sales,info); demoninator=denominator[townid]" and this would expand the denominator vector back to the size I need (from 2x1 to 5x1 and in the correct order). So, one way to address this is if there was a Mata equivalent of Stata's "encode" command. Or, any other way to convert the (17\17\17\28\28) vector into a (1\1\1\2\2) vector. Any thoughts on how to do that?

Last edited by Randy Chugh; 26 Jun 2025, 08:56.
Comment
Randy Chugh

Join Date: Jun 2025

Posts: 15
#4

26 Jun 2025, 09:31

Here's a solution with no for loops. I'm not sure if it's faster, since it requires replicating a vector n times using the J() function, but I think it will work. [Note: edited out and trying again below to post properly.]

Last edited by Randy Chugh; 26 Jun 2025, 09:37.
Comment

Randy Chugh

Join Date: Jun 2025
Posts: 15

26 Jun 2025, 09:37

Here it is:

HTML Code:

 z=(17,1,100)\(17,2,200)\(17,3,600)\(28,1,400)\(28,2,800)

: z
         1     2     3
    +-------------------+
  1 |   17     1   100  |
  2 |   17     2   200  |
  3 |   17     3   600  |
  4 |   28     1   400  |
  5 |   28     2   800  |
    +-------------------+

: newtownid=(J(1,rows(uniqrows(z[.,1])),z[.,1]):==uniqrows(z[.,1])')*(1::rows(uniqrows(z[.,1])))

: newtownid
       1
    +-----+
  1 |  1  |
  2 |  1  |
  3 |  1  |
  4 |  2  |
  5 |  2  |
    +-----+

: info=panelsetup(z,1)

: den=panelsum(z[.,3],info)

: den
          1
    +--------+
  1 |   900  |
  2 |  1200  |
    +--------+

: den=den[newtownid]

: den
          1
    +--------+
  1 |   900  |
  2 |   900  |
  3 |   900  |
  4 |  1200  |
  5 |  1200  |
    +--------+

: share=z[.,3]:/den

: share
                 1
    +---------------+
  1 |  .1111111111  |
  2 |  .2222222222  |
  3 |  .6666666667  |
  4 |  .3333333333  |
  5 |  .6666666667  |
    +---------------+

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35698

26 Jun 2025, 10:16

Going back to #1 I note this solution in Stata (since Stata 7 officially):

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte(townid storeid) int sales
17 1 100
17 2 200
17 3 600
28 1 400
28 2 800
end

egen prop = pc(sales), by(townid) prop

l, sepby(townid)

     +-------------------------------------+
     | townid   storeid   sales       prop |
     |-------------------------------------|
  1. |     17         1     100   .1111111 |
  2. |     17         2     200   .2222222 |
  3. |     17         3     600   .6666667 |
     |-------------------------------------|
  4. |     28         1     400   .3333333 |
  5. |     28         2     800   .6666667 |
     +-------------------------------------+

Comment

Randy Chugh

Join Date: Jun 2025

Posts: 15
#7

26 Jun 2025, 10:42

Thanks, Nick. I'm aware that this is easy to do in Stata.

In Mata, is there a way to efficiently "index" an id vector? E.g., if some vector x takes values (3\3\5\5\5\8), how can I convert that into a vector with values (1\1\2\2\2\3)?

I can do it with for loops, but for loops can be slow. I can do it with the following code: (J(1,rows(uniqrows(x)),x):==uniqrows(x)')*(1::rows (uniqrows(x)))
But that solution does not work if there are thousands of id values (unique values of the x vector).

Basically, I'm looking for a Mata equivalent of Stata's "encode" command.

By the way, the reason I'm trying to do this in Mata is because I'm programming an optimize() routine to find parameters for a large and complicated multinomial choice model, so I need to calculate shares in many "towns" across many "stores" and I need to do so within the evaluator function, conditional on some current value of parameters. I can elaborate if helpful.

Thanks again,
Randy
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#8

26 Jun 2025, 10:47

I don't have a Mata solution for you. Sorry.
Comment
Randy Chugh

Join Date: Jun 2025

Posts: 15
#9

26 Jun 2025, 11:04

Understood, thanks.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#10

26 Jun 2025, 20:08

Originally posted by Randy Chugh View Post

. . . I'm programming an optimize() routine to find parameters for a large and complicated multinomial choice model, so I need to calculate shares in many "towns" across many "stores" and I need to do so within the evaluator function, conditional on some current value of parameters.

What exactly changes from iteration to iteration? The involved stores? Their representation among towns?

Depending upon what does and doesn't change, you might be able to perform some of the time-consuming looping or encode-Mata-equivalent over thousands of stores and their towns just once before the optimize() iterations begin in order to set up like a scaffold matrix or populate a struct vector to receive whatever does change at each update of the parameter estimates. It might still require some looping during fitting, but you could perhaps limit the per-iteration looping to something that's manageable.
Comment

Announcement

Calculating shares in Mata

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment