Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Is there a function similar to "bysort" in Mata?

    Dear Stata users,

    Is there a function similar to "bysort" in Mata?
    For example :
    Code:
    clear
    input double(x y)
    1 28
    1 13
    1  4
    2 25
    2 23
    2 22
    2 30
    3 15
    3  3
    3 11
    3 24
    end
    
    putmata x=x y=y
    The code in Stata is :
    Code:
    bysort x : egen z=sd(y)
    Can the above code (i.e. generate variable 'z') be implemented without using a loop in Mata? Or is there efficient loop code? Because there may be many categories in the categorical variable "x".

    . Any suggestions are highly appreciated.
    Thanks.
    Last edited by Dejin Xie; 21 Jan 2026, 07:09.

  • #2
    Originally posted by Dejin Xie View Post
    Can the above code (i.e. generate variable 'z') be implemented without using a loop in Mata?
    I think that it cannot be implemented in any language without a loop somewhere, even if only somewhere deep, invisible to the end user.

    Or is there efficient loop code? Because there may be many categories in the categorical variable "x".

    . Any suggestions are highly appreciated.
    The suite of Mata panel*() functions are designed to make this kind of task more efficient. I illustrate one possibility below. (Core code is in blue.)
    Code:
    version 19
    
    clear *
    
    input double(x y)
    1 28
    1 13
    1  4
    2 25
    2 23
    2 22
    2 30
    3 15
    3  3
    3 11
    3 24
    end
    
    mata:
    
    stata("sort x") // <= for panelsetup(), the matrix or view must be sorted on the idcol
    
    st_view(Data=(.), ., .)
    
    Info = panelsetup(Data, 1)
    k = rows(Info)
    
    SD = J(k, 1, .)
    
    for (i=1; i<=k; i++) {
        SD[i] = sqrt(variance(panelsubmatrix(Data[., 2], i, Info)))
    }
    
    end
    
    mata: SD
    
    exit

    Comment


    • #3
      Thank you very much, dear @Joseph Coveney . Your code is wonderful !

      Comment


      • #4
        Excuse me, dear @Joseph Coveney .
        I have another question: How can I incorporate the index "i" from the "for" loop into the matrix names?
        For instance, generating five empty matrices (i.e. J(8,1,.)) named Mat1, ... Mat5, where the numbers 1 to 5 in the names correspond to "i".
        Thank you!

        Comment


        • #5
          Originally posted by Dejin Xie View Post
          I have another question: How can I incorporate the index "i" from the "for" loop into the matrix names?
          For instance, generating five empty matrices (i.e. J(8,1,.)) named Mat1, ... Mat5, where the numbers 1 to 5 in the names correspond to "i".
          That approach is more of a Stata-ish than a Mata-ish programming style. There are several approaches that are more idiomatically Mata that accomplish the same kind of objective.

          In your specific example, your "five empty matrices" are each just equal-length column vectors, and so the most straightforward approach in that case would just to create an empty eight-row, five-column matrix and address columns one at time during each pass through the loop.
          Code:
          M = J(8, 5, .)
          for (i=1; i<=cols(M); i++) M[., i] = runiform(rows(M), 1, 0, 1)
          Other approaches involve different data structures that are available in Mata, including structs, pointer vectors and associative arrays, but these are more involved and probably unnecessary if what you want are five column vectors.

          ["suite . . . is designed"]

          Comment


          • #6
            Dear @Joseph Coveney, thank you very much again!
            Your opinion is very pertinent!

            Comment


            • #7
              Hi, you can use function mm_collapse() or mm_collapse2() from moremata (type ssc install moremata). Example:

              Code:
              clear all
              input double(x y)
              1 28
              1 13
              1  4
              2 25
              2 23
              2 22
              2 30
              3 15
              3  3
              3 11
              3 24
              end
              putmata x=x y=y
              Code:
              . mata:
              :     res = mm_collapse(y, 1, x, &variance())
              
              :     res[,2] = sqrt(res[,2])
              
              :     res
                               1             2
                  +-----------------------------+
                1 |            1   12.12435565  |
                2 |            2   3.559026084  |
                3 |            3   8.732124598  |
                  +-----------------------------+
              
              : end
              To get the standard deviation directly, define a new function:

              Code:
              . mata:
              :     function sd(X, w) return(sqrt(variance(X, w)))
              
              :     mm_collapse(y, 1, x, &sd())
                               1             2
                  +-----------------------------+
                1 |            1   12.12435565  |
                2 |            2   3.559026084  |
                3 |            3   8.732124598  |
                  +-----------------------------+
              
              : end
              Use mm_collapse2() if you want to generate a vector containing the aggregate values for each observation:

              Code:
              . mata:
              :     x, mm_collapse2(y, 1, x, &sd())
                                1             2
                   +-----------------------------+
                 1 |            1   12.12435565  |
                 2 |            1   12.12435565  |
                 3 |            1   12.12435565  |
                 4 |            2   3.559026084  |
                 5 |            2   3.559026084  |
                 6 |            2   3.559026084  |
                 7 |            2   3.559026084  |
                 8 |            3   8.732124598  |
                 9 |            3   8.732124598  |
                10 |            3   8.732124598  |
                11 |            3   8.732124598  |
                   +-----------------------------+
              
              : end
              There is no need to pre-sort the data by the grouping variable:

              Code:
              . sort y // jumble data
              
              . list, clean
              
                     x    y  
                1.   3    3  
                2.   1    4  
                3.   3   11  
                4.   1   13  
                5.   3   15  
                6.   2   22  
                7.   2   23  
                8.   3   24  
                9.   2   25  
               10.   1   28  
               11.   2   30  
              
              . putmata x=x y=y, replace
              (2 vectors posted)
              
              . mata: mm_collapse(y, 1, x, &sd())
                               1             2
                  +-----------------------------+
                1 |            1   12.12435565  |
                2 |            2   3.559026084  |
                3 |            3   8.732124598  |
                  +-----------------------------+
              
              . mata: x, mm_collapse2(y, 1, x, &sd())
                                1             2
                   +-----------------------------+
                 1 |            3   8.732124598  |
                 2 |            1   12.12435565  |
                 3 |            3   8.732124598  |
                 4 |            1   12.12435565  |
                 5 |            3   8.732124598  |
                 6 |            2   3.559026084  |
                 7 |            2   3.559026084  |
                 8 |            3   8.732124598  |
                 9 |            2   3.559026084  |
                10 |            1   12.12435565  |
                11 |            2   3.559026084  |
                   +-----------------------------+
              ben


              Comment


              • #8
                Dear @Ben Jann, Thank you very much !
                Your idea and code are excellent !

                Comment


                • #9
                  Overview

                  When working with grouped data, Stata and Mata offer fundamentally different approaches. Stata provides the bysort command, which allows for straightforward, automatic grouping. In contrast, Mata lacks a direct equivalent to Stata's bysort, requiring users to implement grouped operations explicitly. Stata's bysort Command

                  Stata is a data-step language designed for data analysis and manipulation. Its bysort command enables automatic grouping, allowing users to apply commands to data subsets in a single line. Aggregation tasks are handled by built-in functions such as even, streamlining data cleaning and preparation. Mata's Approach

                  Mata, on the other hand, is a matrix programming language. A brief context on why grouped operations must be implemented manually would help readers understand the rationale for computations within each group. Computations within each group are then performed programmatically, typically through loops. Aggregation is not handled automatically by Mata; users must implement their own aggregation logic. While this approach requires added steps, it provides greater speed and customization. Side-by-Side Comparison

                  STATA (bysort) MATA
                  Data-step language Matrix programming language
                  Automatic grouping Manual grouping
                  One-line syntax Multi-step logic
                  bysort group: command order() + panel setup() + loop
                  egen handles aggregation User writes aggregation
                  Best for data cleaning Best for speed & customization

                  Comment


                  • #10
                    Dear @Khaleda Abdullah, Thank you very much !
                    Your summary is very insightful !

                    Comment

                    Working...
                    X