Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why is eigensystemselect() slower than eigensystem()?

    Dear Statalist members,

    I'm using Stata 19MP on Windows and am looking for efficient routines to accelerate the computation of eigenvectors and eigenvalues for a symmetric matrix with all diagonal entries equal to 1. According to help mf_eigensystemselect##intro,
    These routines [i.e., eigensystemselect()] compute subsets of the available eigenvectors. This computation can be much faster than computing all the eigenvectors [by eigensystem()].
    Contrary to expectations, eigensystemselect() appears to be significantly slower than eigensystem(). I am wondering whether I may have used eigensystemselect() incorrectly.

    As shown in the following examples, eigensystemselect() is approximately twice as slow as eigensystem() for a 1000-by-1000 symmetric matrix, regardless of whether one or two processors are used, whether eigensystemselectr() or eigensystemselecti() is called, and whether eigenvalues are supplied to eigensystemselect() or not.

    Code:
    clear all
    mata:
    rseed(1)
    n =1000
    M =runiform(n,1,-5,5)
    M =makesymmetric(exp(-(rowsum(M:^2) :-2*M*M' :+rowsum(M:^2)')))
    e =sqrt(epsilon(1))
    end
    
    set processors 2
    mata:
    timer_clear()
    timer_on(1); symeigensystem(M, V0=., L0=.)
    i =selectindex(L0:>=e);
    L1 =L0[ i]
    V1 =V0[,i]
    timer_off(1)
    
    timer_on(2); symeigensystemselectr(M, (e,.)    , V2=., L2=. ); timer_off(2)
    timer_on(3); symeigensystemselecti(M, minmax(i), V3=., L3=. ); timer_off(3)
    timer_on(4); symeigensystemselectr(M, (e,.)    , V4=., L4=L0); timer_off(4) //eigenvalues are supplied
    timer_on(5); symeigensystemselecti(M, minmax(i), V5=., L5=L0); timer_off(5) //eigenvalues are supplied
    timer()
    end
    ---------------------------------------------------------------------------
    timer report
      1.       .155 /        1 =      .155
      2.       .423 /        1 =      .423
      3.       .419 /        1 =      .419
      4.       .421 /        1 =      .421
      5.       .418 /        1 =      .418
    ---------------------------------------------------------------------------
    
    set processors 1
    mata:
    timer_clear()
    timer_on(1); symeigensystem(M, V0=., L0=.)
    i =selectindex(L0:>=e);
    L1 =L0[ i]
    V1 =V0[,i]
    timer_off(1)
    
    timer_on(2); symeigensystemselectr(M, (e,.)    , V2=., L2=. ); timer_off(2)
    timer_on(3); symeigensystemselecti(M, minmax(i), V3=., L3=. ); timer_off(3)
    timer_on(4); symeigensystemselectr(M, (e,.)    , V4=., L4=L0); timer_off(4) //eigenvalues are supplied
    timer_on(5); symeigensystemselecti(M, minmax(i), V5=., L5=L0); timer_off(5) //eigenvalues are supplied
    timer()
    end
    ---------------------------------------------------------------------------
    timer report
      1.       .196 /        1 =      .196
      2.       .415 /        1 =      .415
      3.       .409 /        1 =      .409
      4.       .418 /        1 =      .418
      5.       .409 /        1 =      .409
    ---------------------------------------------------------------------------


    I also ran this example by Stata 17SE on the same machine, and the results were mixed. On one hand, eigensystemselect() is indeed faster than eigensystem(); on the other hand, eigensystemselect() runs faster on Stata 17SE than Stata 19MP. Additionally, it seems that eigensystem() was improved by Stata 19, whereas eigensystemselect() has gotten worse compared to Stata 17SE, which makes me even more uncertain whether I am using eigensystemselect() correctly.

    Code:
    clear all
    mata:
    rseed(1)
    n =1000
    M =runiform(n,1,-5,5)
    M =makesymmetric(exp(-(rowsum(M:^2) :-2*M*M' :+rowsum(M:^2)')))
    e =sqrt(epsilon(1))
    end
    
    mata:
    timer_clear()
    timer_on(1); symeigensystem(M, V0=., L0=.)
    i =selectindex(L0:>=e);
    L1 =L0[ i]
    V1 =V0[,i]
    timer_off(1)
    
    timer_on(2); symeigensystemselectr(M, (e,.)    , V2=., L2=. ); timer_off(2)
    timer_on(3); symeigensystemselecti(M, minmax(i), V3=., L3=. ); timer_off(3)
    timer_on(4); symeigensystemselectr(M, (e,.)    , V4=., L4=L0); timer_off(4) //eigenvalues are supplied
    timer_on(5); symeigensystemselecti(M, minmax(i), V5=., L5=L0); timer_off(5) //eigenvalues are supplied
    timer()
    end
    ---------------------------------------------------------------------------
    timer report
      1.       .338 /        1 =      .338
      2.       .305 /        1 =      .305
      3.       .302 /        1 =      .302
      4.       .307 /        1 =      .307
      5.       .304 /        1 =      .304
    ---------------------------------------------------------------------------


    Lastly, besides eigensystem() and eigensystemselect(), are there any faster methods to compute eigenvalues that are larger than sqrt(epsilon(1)) and their corresponding eigenvectors for a symmetric matrix with all diagonal entries equal to 1?

    Thank you very much!
    Last edited by Chi-lin Tsai; 14 Oct 2025, 16:23.

  • #2
    Originally posted by Chi-lin Tsai View Post
    Contrary to expectations, eigensystemselect() appears to be significantly slower than eigensystem(). I am wondering whether I may have used eigensystemselect() incorrectly.
    I don't see anything that you're doing incorrectly. It seems to be a large matrix. Are these from long time series or something?

    I played around with both functions with a modified version of your code and it seems as if the following happen: (i) at any time one can be faster than the other and vice versa and (ii) both run faster after one of them has been executed once. Their behavior is as if (i) they're using the same back end to BLAS / LAPACK and (ii) the first time through, the first one called is caching components of the runtime library or instantiating some object behind the scenes that is used by both.

    In your case, you're taking all positive eigenvalues and, if you have correlation matrices with zero or one nonpositive eigenvalue, then you might not be saving much time by using symeigensystemselectr().

    I don't know the answer to your last question other than to suggest that, if you're on a machine that uses Intel processors, then you might be able to take advantage of the MKL library that Intel optimized for them. In that case you can specify to use it over the default Fortran 90 library from Netlib by setting set lapack_mkl beforehand.
    Last edited by Joseph Coveney; 15 Oct 2025, 18:43.

    Comment


    • #3
      Joseph, thank you very much for your reply. I'm working with the radial-basis-function (rbf) kernel — an n-by-n matrix, where n is the number of observations — so yes, it's typically quite large.

      In the example from my previous post, the 1000-by-1000 matrix has only about 35 eigenvalues greater than sqrt(epsilon(1)), so I expected symeigensystemselect() to run significantly faster than symeigensystem(), as suggested in mf_eigensystemselect##intro, but unfortunately, the results showed the opposite.

      Could you kindly share your modified code with me for further experimentation? On my machine (AMD 3700X 8-core 3.59GHz CPU, 32GB RAM, Windows 10), after several additional attempts, my code consistently produces the following results:
      (1) symeigensystem() runs faster than symeigensystemselect(), regardless of which one is executed first,

      (2) symeigensystemselect() in Stata 19MP runs slower than it does in Stata 17SE, regardless of whether one or two processor cores are used by Stata 19MP.

      (3) these results were obtained using the MKL library (set lapack_mkl on by default). However, I found that set lapack_mkl off significantly slows down symeigensystem() but has almost no effect on symeigensystemselect(), which seems to suggest that symeigensystemselect() doesn't take advantage of the MKL library when set lapack_mkl on, while symeigensystem() does. Even so, that still can't explain point (2).

      I also ran the same tests on another machine (Intel i5-8500 6-core 3.00GHz CPU, 8GB RAM, Windows 10), and the results align with the three points mentioned above.
      Last edited by Chi-lin Tsai; 16 Oct 2025, 10:11.

      Comment


      • #4
        Originally posted by Chi-lin Tsai View Post
        Could you kindly share your modified code with me for further experimentation?
        Sure, I've attached a do-file that contains the code and two log files with some example results showing some of what I described above. Usage note: I did the following at the command line:

        Code:
        run se3
        log using <whatever>.smcl, nomsg name(lo)
        <either> grrg() <or> grrg(0)
        <Page Up and Carriage Return twice>
        log close lo
        exiting Stata and restarting Stata between examples.

        grrg(0) executes symeigensystemselectr() first; otherwise, symeigensystem() and associated code is executed first.

        A few notes on the differences from your case:

        1. Instead of rbf kernels, I created smaller (neighborhood of 50 × 50) correlation matrices with much smaller expectations of nonpositive eigenvalues.

        2. I have an antique laptop and used the default Netlib not MKL. With your machines, you might not even see any elapsed time with such small matrices. (You can set the matrix dimensions with an optional argument to grrg(). You can also swap in your rbf kernel-generating code for what I have in the setUp() function and then run with that.)

        3. Some of the timing results might have been affected (randomly jittered) by housekeeping activities or other things that the operating system happened to be doing contemporaneously.

        I don't recall seeing any announcement from StataCorp that could explain the unexpected differences that you see between Releases 17 SE and 19 MP.
        Attached Files

        Comment


        • #5
          Joseph, thank you for sharing your code. I'll take a closer look and post an update here if I find anything new. Thank you!

          Comment


          • #6
            According to Stata Technical Support,

            1. eigensystemselect() runs slower than eigensystem(), because former uses the old version of MKL, whereas the latter uses a newer, optimised version.

            2. eigensystemselect() runs slower in Stata 19 than in Stata 17, because the compiler on Windows is upgraded from Visual C++ 2017 to 2019.

            Stata Technical Support is unable to provide a timeline for when eigensystemselect() will be upgraded to the new MKL version. However, the documentation will be updated to reflect the differences outlined above.

            Comment


            • #7
              Thank you for the follow-up on this, Chi-lin.

              Comment

              Working...
              X