Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mata timing mystery

    I've stumbled across an odd behavior in the run time of some Mata commands. This example distills what I found. I generate a large random matrix u, whose elements are plus or minus 0.5, with equal probability. Then I multiply it by another matrix. I slightly vary how u is constructed, in a way that should seemingly make no difference. Yet this affects the speed of the subsequent multiplication. It appears to do so more when I am using multiple processors.

    Code:
    Code:
    cap mata mata drop test()
    
    mata
    mata clear
    mata set matastrict on
    mata set mataoptimize on
    mata set matalnum off
    
    void test() {
        real matrix u, X; real scalar i
        X = runiform(100,10000)
    
        timer_clear()
        rseed(1)
        u = runiform(10000, 10000) :>= .5; u = u :- .5
        timer_on(1)
        for (i=10;i;i--)
            (void) X*u
        timer_off(1)
        
        rseed(1)
        u = (runiform(10000, 10000) :>= .5) :- .5
        timer_on(2)
        for (i=10;i;i--)
            (void) X*u
        timer_off(2)
        timer()
    }
    test()
    end
    If I do "set processors 1" before running this, I get:
    Code:
    --------------------------------------------------------------------------------------------------------
    timer report
      1.       77.9 /        1 =    77.948
      2.       80.7 /        1 =    80.733
    --------------------------------------------------------------------------------------------------------
    More dramatically, if I do "set processors 4"--on my quad-core CPU, in Stata/MP--I get:
    Code:
    --------------------------------------------------------------------------------------------------------
    timer report
      1.       37.3 /        1 =    37.259
      2.       48.4 /        1 =    48.448
    --------------------------------------------------------------------------------------------------------
    Does anyone understand why the first method is results in faster multiplication? It must have something to do with how or where u is stored.

    I'm doing this on an Intel i7-8650, which is a new, quad-core chip for laptops. Hyperthreading is enabled. I have 16GB, which is plenty for these operations.

  • #2
    For what it's worth, on my machine (Stata 15 MP 2, older I7 processor, Win 7), both of your versions run in about 90 sec on two cores, about 97 sec on one core.

    Comment


    • #3
      Thanks, Mike. Yes, it definitely seems to depend on some trait of the processor. I also get the same run time on a laptop with an i7-3520M (5 generations older). But I get the effect documented above on even-older workstation CPUs, a pair of hex-core Xeon X5675's.

      Comment


      • #4
        FWIW, I tested on Stata 14.2 for Linux (using 1 and then 6 cores) and both took the same speed. That said, I often found weird performance results when writing Mata code in my Windows laptop (I thought I found an optimization for something, then when I tested it on the Linux server they gave the same speed).

        Comment

        Working...
        X