Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • A more efficient way to estimate clustered standard errors

    Dear all,
    I'm currently working on a code for the estimation of clustered standard errors, which I'm trying to do using mata.
    My code is similar to the one shared by David Drukker in the Statablog found here

    The code that I'm trying to optimize is basically like the following:
    Code:
     
        else if (vcetype == "cluster") {
            cvar = st_data(., clustervar, touse)
            info = panelsetup(cvar, 1)
            nc   = rows(info)
            M    = J(k, k, 0)
            dfr  = nc - 1
            for(i=1; i<=nc; i++) {
                xi = panelsubmatrix(X,i,info)
                ei = panelsubmatrix(e,i,info)
                M  = M + (xi'*ei)*(ei'*xi)
            }
            V    = ((n-1)/(n-k))*(nc/(nc-1))*XpXi*M*XpXi
        }
    It seems that this piece of code takes the longest time to produce what i want. (not terribly wrong, but much slower than say using "regress, cluster()"

    Does anyone have any suggestions on making this code run faster?
    Thank you in advance.
    Fernando
    Cross-posted in the general forum

  • #2
    I would use st_view() instead of st_data() and then panelsubview() instead of panelsubmatrix(). Using views instead of making copies of the data should both be faster and consume less memory.

    I would also replace rows(info) by panelstats(info)[1], although that probably does not really matter.

    Within the loop, you can speed the calculation up a bit by computing the product xi'*ei only once instead of twice:
    Code:
    xiei = cross(xi, ei)
    M    = M + xiei * xiei'
    If xi and ei are views, then cross(xi, ei) is faster than xi'*ei.
    https://twitter.com/Kripfganz

    Comment


    • #3
      Dear Sebastian
      Thank you for your suggestion!. I'll try it on when I update one of my programs.
      Best regards

      Comment


      • #4
        Maybe too late to be useful, but the undocumented Mata function panelsum(.) probably means you don't have to loop. From the help file: "real matrix = panelsum(X, [weights,] info)" and "panelsum() computes within-panel sums of the columns of X according to the panel information in info."

        Comment


        • #5
          Hi Mark
          Thank you for your input. I ll try your suggestion and see if that gives me better results than Sebastian suggestion.
          Best Regards

          Comment

          Working...
          X