Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating shares in Mata

    Hello,

    I would like to calculate shares in Mata. This is easy in Stata, but hard in Mata, and wondering if there is some trick or method that I am overlooking.

    I have a set of, say, 3 stores that compete in 2 towns (not all stores compete in all towns). The sales data looks like this:
    townid storeid sales
    17 1 100
    17 2 200
    17 3 600
    28 1 400
    28 2 800
    In Stata, I would type "bysort townid: egen denominator=sum(sales); gen share=sales/denominator", but there's nothing quite like egen in Mata. The closest I can come to this is "panelsum()". I can write "info=panelsetup(townid,1); panelsum(sales,info)". But then I'm left with the following data:
    sales
    900
    1200
    In order to use this result as a denominator, I need to expand this result back to being a 5x1 vector, so that it looks like this:
    denominator
    900
    900
    900
    1200
    1200
    Any thoughts on how to achieve this last step?

    Thanks!
    Randy


    Last edited by Randy Chugh; 26 Jun 2025, 07:27.

  • #2
    Here's a somewhat inelegant approach. The fourth column of zaug contains the shares.
    Code:
    mata
    
    z=(17,1,100)\(17,2,200)\(17,3,600)\(28,1,400)\(28,2,800)
    
    tid=uniqrows(z[.,1])
    zaug=J(0,4,.)
    for (j=1;j<=rows(tid);j++) {
     ztemp=select(z,z[.,1]:==tid[j])
     zaug=zaug\(ztemp,ztemp[.,3]:/sum(ztemp[.,3]))
    }
    
    zaug
    
    end
    Result:
    Code:
    :
    : zaug
                     1             2             3             4
        +---------------------------------------------------------+
      1 |           17             1           100   .1111111111  |
      2 |           17             2           200   .2222222222  |
      3 |           17             3           600   .6666666667  |
      4 |           28             1           400   .3333333333  |
      5 |           28             2           800   .6666666667  |
        +---------------------------------------------------------+

    Comment


    • #3
      Thanks, John. This will work, but I'm trying to avoid for loops due to speed concerns (imagine there are thousands of towns and thousands of stores). Note that this would be simple to do if "townid" took values of 1 and 2 instead of 17 and 24. In that case, I would just preserve the townid vector and then do the following "denominator=panelsum(sales,info); demoninator=denominator[townid]" and this would expand the denominator vector back to the size I need (from 2x1 to 5x1 and in the correct order). So, one way to address this is if there was a Mata equivalent of Stata's "encode" command. Or, any other way to convert the (17\17\17\28\28) vector into a (1\1\1\2\2) vector. Any thoughts on how to do that?
      Last edited by Randy Chugh; 26 Jun 2025, 08:56.

      Comment


      • #4
        Here's a solution with no for loops. I'm not sure if it's faster, since it requires replicating a vector n times using the J() function, but I think it will work. [Note: edited out and trying again below to post properly.]
        Last edited by Randy Chugh; 26 Jun 2025, 09:37.

        Comment


        • #5
          Here it is:
          HTML Code:
           z=(17,1,100)\(17,2,200)\(17,3,600)\(28,1,400)\(28,2,800)
          
          : z
                   1     2     3
              +-------------------+
            1 |   17     1   100  |
            2 |   17     2   200  |
            3 |   17     3   600  |
            4 |   28     1   400  |
            5 |   28     2   800  |
              +-------------------+
          
          : newtownid=(J(1,rows(uniqrows(z[.,1])),z[.,1]):==uniqrows(z[.,1])')*(1::rows(uniqrows(z[.,1])))
          
          : newtownid
                 1
              +-----+
            1 |  1  |
            2 |  1  |
            3 |  1  |
            4 |  2  |
            5 |  2  |
              +-----+
          
          : info=panelsetup(z,1)
          
          : den=panelsum(z[.,3],info)
          
          : den
                    1
              +--------+
            1 |   900  |
            2 |  1200  |
              +--------+
          
          : den=den[newtownid]
          
          : den
                    1
              +--------+
            1 |   900  |
            2 |   900  |
            3 |   900  |
            4 |  1200  |
            5 |  1200  |
              +--------+
          
          : share=z[.,3]:/den
          
          : share
                           1
              +---------------+
            1 |  .1111111111  |
            2 |  .2222222222  |
            3 |  .6666666667  |
            4 |  .3333333333  |
            5 |  .6666666667  |
              +---------------+

          Comment


          • #6
            Going back to #1 I note this solution in Stata (since Stata 7 officially):

            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input byte(townid storeid) int sales
            17 1 100
            17 2 200
            17 3 600
            28 1 400
            28 2 800
            end
            
            egen prop = pc(sales), by(townid) prop
            
            l, sepby(townid)
            
                 +-------------------------------------+
                 | townid   storeid   sales       prop |
                 |-------------------------------------|
              1. |     17         1     100   .1111111 |
              2. |     17         2     200   .2222222 |
              3. |     17         3     600   .6666667 |
                 |-------------------------------------|
              4. |     28         1     400   .3333333 |
              5. |     28         2     800   .6666667 |
                 +-------------------------------------+

            Comment


            • #7
              Thanks, Nick. I'm aware that this is easy to do in Stata.

              In Mata, is there a way to efficiently "index" an id vector? E.g., if some vector x takes values (3\3\5\5\5\8), how can I convert that into a vector with values (1\1\2\2\2\3)?

              I can do it with for loops, but for loops can be slow. I can do it with the following code: (J(1,rows(uniqrows(x)),x):==uniqrows(x)')*(1::rows (uniqrows(x)))
              But that solution does not work if there are thousands of id values (unique values of the x vector).

              Basically, I'm looking for a Mata equivalent of Stata's "encode" command.

              By the way, the reason I'm trying to do this in Mata is because I'm programming an optimize() routine to find parameters for a large and complicated multinomial choice model, so I need to calculate shares in many "towns" across many "stores" and I need to do so within the evaluator function, conditional on some current value of parameters. I can elaborate if helpful.

              Thanks again,
              Randy

              Comment


              • #8
                I don't have a Mata solution for you. Sorry.

                Comment


                • #9
                  Understood, thanks.

                  Comment


                  • #10
                    Originally posted by Randy Chugh View Post
                    . . . I'm programming an optimize() routine to find parameters for a large and complicated multinomial choice model, so I need to calculate shares in many "towns" across many "stores" and I need to do so within the evaluator function, conditional on some current value of parameters.
                    What exactly changes from iteration to iteration? The involved stores? Their representation among towns?

                    Depending upon what does and doesn't change, you might be able to perform some of the time-consuming looping or encode-Mata-equivalent over thousands of stores and their towns just once before the optimize() iterations begin in order to set up like a scaffold matrix or populate a struct vector to receive whatever does change at each update of the parameter estimates. It might still require some looping during fitting, but you could perhaps limit the per-iteration looping to something that's manageable.

                    Comment

                    Working...
                    X