Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why is sort so slow? Sorting data mata vs. R

    mata is fast for many things, but sorting a vector is not one of them.

    Below I generate a random vector containing 1 million observations. I sort it 100 times. This process takes mata a little over 75 seconds. The same exercise in R takes around 17 seconds. This is a considerable time difference which only gets worse the larger the number of observations.
    Click image for larger version

Name:	R.JPG
Views:	2
Size:	38.0 KB
ID:	1368376

    Click image for larger version

Name:	mata.JPG
Views:	1
Size:	19.6 KB
ID:	1368377


    If I have one item in my wishlist is for Stata to update their sort, in mata and in Stata. Please make the update backwards compatible, not only for Stata 15.
    Attached Files

  • #2
    You can speed Mata up a bit by using sort(y) instead of y[order(y,1),.]:

    Code:
    mata
    y=rnormal(1e6,1,10,5)
    timer_clear()
    for(i=1;i<=100;i++) {
        timer_on(1)
        x1=sort(x,1)
        timer_off(1)
        timer_on(2)
        x2=y[order(y,1),.]
        timer_off(2)
    }
    timer()
    end
    In general Stata is very good at forward compatibility. For example, beta4 part of the betafit package on SSC was last touched 18 years ago and still works in the current version of Stata. I think that is pretty impressive. But StataCorp is a commercial company, which makes them obviously reluctant to do backward compatibility. They to have bills (and Bills) to pay.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Thanks for the reply Maarten.

      Mata's sort command uses order, therefore there should be no difference. If you try running the code below, which is a slightly modified version of your code, you'll notice that the time difference is minimal. Regardless, the difference with other software is quite noticeable (~70 seconds vs. ~18 seconds) and leaves Stata at a considerable disadvantage.

      Code:
      mata
      y=rnormal(1e6,1,10,5)
      timer_clear()
      timer_on(1)
      for(i=1;i<=100;i++) {
          x1=sort(y,1)    
      }
      timer_off(1)
      
      timer_on(2)
      for(i=1;i<=100;i++) {
          x2=y[order(y,1),.]    
      }
      timer_off(2)
      
      
      timer()
      end

      Comment

      Working...
      X