Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Some updated numbers for those who wonders how Stata works on the brand-new Apple Silicon.

    M1 Mac-mini, base model (8GB ram)
    Stata 16.1 SE. Not sure if it's compiled for Apple Rosetta but it's running smoothly.
    variable p50 mean sd min max
    replace .012 .0123 .0008233 .011 .014
    regress .0825 .0827 .0012517 .081 .084
    predict .027 .0271 .0012867 .026 .03
    correl .061 .0646 .0118152 .059 .098
    bootstrap 12.5705 12.6407 .217314 12.482 13.21
    mvtest .1675 .1688 .0084696 .153 .187
    xtile .751 .7464 .0284066 .683 .781
    expand_drop .053 .0541 .0026437 .052 .061
    arfima 87.2375 88.9892 3.781558 85.517 94.98
    eigenv .229 .2299 .0041753 .225 .239

    All commands are more or less oka, at least compared to OP's figures, but arfima is terribly slow. Don't know why.

    Comment


    • #17
      I should have attached this table on my previous posting.
      For comparison...

      MacBook Pro 16" (2019) Base Model (i7 2.6Ghz 6-core, 16GB RAM)
      Stata 16.1 SE
      variable p50 mean sd min max
      replace .0135 .0143 .0024518 .013 .021
      regress .0835 .0836 .0012649 .082 .086
      predict .0285 .0287 .0014181 .027 .031
      correl .061 .066 .0165261 .06 .113
      bootstrap 11.1815 11.2029 .0619255 11.141 11.344
      mvtest .2035 .208 .0150555 .196 .238
      xtile .6905 .7192 .0767736 .622 .88
      expand_drop .0685 .069 .0033665 .065 .074
      arfima 138.294 134.2177 5.631907 127.072 138.865
      eigenv .3055 .3342 .0930971 .3 .599
      - The M1 Mac mini is marginally better than the Intel MacBook Pro. MBPr price = 3.5x Mac mini price and Stata is based on Intel binaries. Well...
      - arfima performance is even worse. It might be a Mac problem, rather than a M1 one.

      It would be interesting to see how those are different in multicore environment... but I do not have a MP license.
      Last edited by Sunham Kim; 24 Nov 2020, 14:10.

      Comment


      • #18
        Okay, well I know this thread is a bit outdated, so it may be an unfair comparison, but here is the results run on my Alienware R15, Intel i9-13900KF, 24 Cores, 64G RAM

        Stata license: Single-user 24-core , expiring 20 Apr 2024

        Variable | p50 Mean SD Min Max
        -------------+--------------------------------------------------
        replace | .001 .0013 .000483 .001 .002
        regress | .011 .011 .0008165 .01 .012
        predict | .0025 .0036 .0036878 .002 .014
        correl | .005 .0062 .0034577 .005 .016
        bootstrap | 2.7235 2.7603 .0992438 2.678 3.001
        mvtest | .049 .0503 .0027909 .047 .055
        xtile | .232 .2313 .0127284 .214 .257
        expand_drop | .0355 .0363 .0042439 .028 .042
        arfima | 2.187 2.2132 .0847726 2.139 2.442
        eigenv | .184 .1847 .0042177 .179 .192
        ----------------------------------------------------------------


        W00t!!!

        Comment


        • #19
          Interesting Matthew,

          On my system:
          13th Gen Intel(R) Core(TM) i9-13900K 3.00 GHz
          RAM 64.0 GB
          Windows 11 Pro (installed on SSD)
          Stata/MP 18.0 : Single-user 12-core

          I have a near similar result:
          Code:
              Variable |       p50      Mean        SD       Min       Max
          -------------+--------------------------------------------------
               replace |      .001     .0013   .000483      .001      .002
               regress |      .015     .0155  .0015092      .014      .018
               predict |      .003     .0032  .0013984      .002      .007
                correl |      .006     .0067  .0020028      .005      .012
             bootstrap |    2.8745    2.8973  .0467239     2.847      2.98
                mvtest |     .0475     .0488  .0037653      .045      .056
                 xtile |      .219     .2206  .0124383      .199      .247
           expand_drop |     .0415     .0409  .0025144      .037      .044
                arfima |    3.0005    3.0169  .0876755     2.892     3.208
                eigenv |     .1845     .1854  .0030258      .181      .191
          ----------------------------------------------------------------
          http://publicationslist.org/eric.melse

          Comment


          • #20
            Dang, looks like I need more cores on both my CPU and my Stata. The 12 core ran the bootstrap in 40% of the time than did my 4-core Stata 18 (even with a higher clock speed on my Ryzen 7 2700x).

            Comment


            • #21
              My computer i7-1260P (12 cores/16 logical processors), 32 GB DDR4 RAM, Stata 18 MP/4.

              Code:
                  Variable |       p50      Mean        SD       Min       Max
              -------------+--------------------------------------------------
                   replace |      .003     .0032  .0004216      .003      .004
                   regress |      .025     .0279  .0095621      .024      .055
                   predict |      .006     .0084  .0042479      .005      .018
                    correl |     .0135     .0141  .0018529      .013      .019
                 bootstrap |    4.6275    4.7257  .3023861     4.552     5.552
                    mvtest |     .0895     .0882   .004638      .077      .093
                     xtile |     .3595     .3521  .0185739      .305      .365
               expand_drop |      .058     .0568  .0045412      .044      .059
                    arfima |    3.1775    3.1712  .0551338      3.05      3.26
                    eigenv |      .237     .2367  .0012517      .234      .238
              ----------------------------------------------------------------

              Comment


              • #22
                My results on a:
                CPU: 13700k, running at 5ghz
                RAM: DDR 4, 128GB at 3200ghz, dual channel setup.
                Storage: gen 4 NVME (4GBS+ read and write)
                Stata 18, revision 30 aug 2023, MP4.
                Variable p50 Mean SD Min Max
                replace .002 .0024 .0005164 .002 .003
                regress .0175 .0179 .0020248 .016 .023
                predict .0055 .0056 .0016465 .004 .008
                correl .013 .0137 .0043218 .01 .025
                bootstrap 3.2785 3.2828 .0422263 3.226 3.362
                mvtest .064 .0637 .0047152 .053 .071
                xtile .249 .2468 .0078429 .234 .259
                expand_drop .035 .0356 .0030258 .031 .043
                arfima 2.2175 2.2363 .059498 2.184 2.345
                eigenv .187 .1877 .0023594 .186 .194
                Stata SE (set processors 1). (the cpu speed automatically increases to 5.2ghz):
                Variable p50 Mean SD Min Max
                replace .008 .0085 .0007071 .008 .01
                regress .0615 .0609 .0024244 .058 .064
                predict .016 .0168 .0014757 .015 .02
                correl .0465 .0469 .0011005 .046 .049
                bootstrap 5.1195 5.1298 .0573543 5.049 5.212
                mvtest .1275 .1285 .004223 .122 .136
                xtile .293 .2917 .010133 .267 .303
                expand_drop .0375 .0372 .0030478 .032 .042
                arfima 2.7505 2.7471 .0865094 2.638 2.912
                eigenv .1895 .1903 .0025841 .188 .196

                Similar configuration with a Ryzen 3900x 4ghz, MP4:
                Variable p50 Mean SD Min Max
                replace .003 .0031 .0007379 .002 .005
                regress .0275 .0286 .0037476 .026 .039
                predict .012 .0127 .0040838 .007 .021
                correl .016 .018 .0051854 .015 .031
                bootstrap 5.487 5.5256 .1562919 5.404 5.943
                mvtest .1065 .1061 .0027264 .1 .111
                xtile .426 .4233 .0128327 .392 .44
                expand_drop .0595 .0598 .0028597 .054 .065
                arfima 4.985 5.0383 .1636324 4.964 5.501
                eigenv .2865 .2875 .0028771 .284 .293
                My conclusion: better to invest first in a fresh CPU (350 USD, lifetime) than in an expensive MP license beyond 4 cores.
                CPU vintage and clock speeds are king.
                Last edited by alejoforero; 01 Sep 2023, 18:40.

                Comment


                • #23
                  Hmm. The 13700k smoked the 3900x, but has a much higher clock speed. From what I've read, it seems like the clock speed is the most important factor.

                  Comment


                  • #24
                    Clock speed is certainly key, but not everything: I reran the Intel13700k limiting the clock speed to 4ghz, comparable to the AMD Ryzen (everything else, RAM, etc, constant), and these are the results:
                    Variable p50 Mean SD Min Max
                    replace .003 .0032 .0004216 .003 .004
                    regress .023 .0233 .0010594 .022 .026
                    predict .0075 .0069 .001792 .005 .01
                    correl .016 .0151 .0015239 .012 .017
                    bootstrap 4.1 4.1079 .0278786 4.084 4.168
                    mvtest .073 .075 .0048762 .072 .088
                    xtile .282 .2832 .0057696 .277 .297
                    expand_drop .042 .0427 .0020575 .04 .048
                    arfima 2.5775 2.5911 .0671126 2.528 2.773
                    eigenv .2545 .255 .0023094 .252 .259
                    So the Intel is faster than the AMD even at the same clock speeds. There are two possible explanations:

                    1. Vintage: the 13700k (and 13900k, etc) is a 2022 cpu whereas the 3900x is a 2019 cpu. This impacts optimization, instructions per clock, etc. Both cpus have avx2, so thats not the reason.

                    2. Intel optimization: Stata uses Intel Math Kernel Library (MKL) https://www.stata.com/stata17/math-kernel-library-mkl/, which, while runs on AMD, is better optimized for Intel CPUs.

                    The lesson so far seems to be: get the latest, fastest (clock speed), Intel CPU you can. Then get the highest core count license you can afford.


                    One missing variable in this analysis is the number of observations. It is entirely possible that the ability of Stata to scale across many cores is related to the number of observations. All these benchmarks have just 500k observations, which is rather small and parallelization savings might be overrun by parallelization overhead. In the Stata MP report N varies by command, but can be 30 million or higher.
                    Last edited by alejoforero; 03 Sep 2023, 20:53.

                    Comment


                    • #25
                      Interesting. Thanks for the additional work. Looks like I'll be moving to an Intel chip next round.

                      Comment


                      • #26
                        Let me chime in there. I tested an i9-13900h notebook and an Apple M3 Pro, also because I wanted to know whether the best Apple notebooks can keep up with the best Intel ones. Three things to learn:

                        First, it seems that for Intel, setting your CPU to Maximum Energy ("beste Leistung" in German, don't know what it is in English), is important (see first and second test).

                        Second, the M3 is almost always faster then the i9, despite the i9 reaching 5.4 Ghz and the M3 only 4 Ghz. But there is one weird exception to the fast M3 speed: arfima is about 20 (!) times slower on the Mac! (This seems to be a general Apple problem, users above that had an Apple had similarly slow speeds in arfima). This suggests that Stata has to optimize some commands for Apple, while overall Apple is pretty fast in Stata.

                        Third, since the M3 pro only has 6 power cores (and 6 efficiency cores), only using 6 rather than 8 cores can actually speed up the calculations, as the 8-core setup makes Stata use 6 fast cores plus 2 slow cores, which (with diminishing returns with each additional core) actually rather slows the calculations down, rather than speeding them up. So for M3 users, I would suggest setting the number of cores to your computer's power cores, leaving the efficiency cores to handle the OS.


                        See the difference:

                        i9 Balanced 8 cores
                        Variable | p50 Mean SD Min Max
                        -------------+--------------------------------------------------
                        replace | .002 .0022 .0004216 .002 .003
                        regress | .02 .0206 .0011738 .019 .023
                        predict | .005 .0054 .0012649 .004 .007
                        correl | .01 .0103 .0014181 .009 .014
                        bootstrap | 4.0085 4.0014 .0341344 3.935 4.04
                        mvtest | .0685 .0689 .0031073 .065 .075
                        xtile | .3195 .3153 .0166203 .28 .334
                        expand_drop | .04 .0412 .0024404 .039 .045
                        arfima | 2.579 2.7908 .6079641 2.518 4.509
                        eigenv | .1645 .1648 .0042895 .156 .171
                        ----------------------------------------------------------------

                        i9 Maximum energy 8 cores
                        Variable | p50 Mean SD Min Max
                        -------------+--------------------------------------------------
                        replace | .002 .002 .0004714 .001 .003
                        regress | .018 .0188 .002974 .017 .027
                        predict | .004 .0042 .0004216 .004 .005
                        correl | .008 .0082 .0004216 .008 .009
                        bootstrap | 3.4995 3.5048 .0378294 3.452 3.577
                        mvtest | .067 .0667 .0022632 .062 .07
                        xtile | .269 .2665 .0153858 .237 .286
                        expand_drop | .04 .0392 .0034254 .032 .044
                        arfima | 2.4985 2.4619 .0907346 2.3 2.552
                        eigenv | .142 .1411 .0026013 .137 .144
                        ----------------------------------------------------------------

                        M3 pro 8 cores
                        Variable | p50 Mean SD Min Max
                        -------------+--------------------------------------------------
                        replace | .002 .0018 .0004216 .001 .002
                        regress | .013 .0129 .0003162 .012 .013
                        predict | .004 .0037 .0006749 .003 .005
                        correl | .007 .0069 .0003162 .006 .007
                        bootstrap | 2.9405 2.9445 .0200181 2.922 2.989
                        mvtest | .043 .0427 .0018288 .038 .045
                        xtile | .266 .2615 .0135831 .223 .267
                        expand_drop | .032 .0323 .0006749 .031 .033
                        arfima | 61.8575 62.078 .8496164 61.307 64.351
                        eigenv | .15 .1499 .0017288 .146 .152
                        ----------------------------------------------------------------


                        M3 pro 6 cores
                        Variable | p50 Mean SD Min Max
                        -------------+--------------------------------------------------
                        replace | .0015 .0015 .000527 .001 .002
                        regress | .0125 .0124 .0006992 .011 .013
                        predict | .003 .0036 .0009661 .003 .006
                        correl | .007 .0068 .0006325 .006 .008
                        bootstrap | 2.85 2.8473 .0164725 2.818 2.867
                        mvtest | .042 .0437 .003401 .041 .05
                        xtile | .2605 .2578 .0150466 .217 .272
                        expand_drop | .0325 .0324 .0006992 .031 .033
                        arfima | 69.6715 68.7322 2.183425 63.51 70.001
                        eigenv | .1495 .1498 .0022509 .146 .153
                        ----------------------------------------------------------------





                        Comment

                        Working...
                        X