Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    This is not the fastest solution (both Sergio's and Daniel's are faster).

    My approach is to handle the measurements one by one.
    I assume that measurements are few relative to number of rows.

    So my suggestion becomes:
    Code:
    mata:
        R = rows(X)
        C = cols(X)
        A2 = J(N, N, 0)
        timer_clear(1)
        timer_on(1)
        for (reps = 1; reps <= `nreps' ; reps++) { // outer loop just for timing
            for (c = 1; c <= C; c++) A2 = A2 + (J(R, 1, X[,c]') :== X[,c])
        }    
        timer_off(1)
        timer()
    end
    it is 22% faster than #1, but with fewer measurements it becomes faster.

    What you get is a full, but symmetric matrix similar to the requested answer.
    Kind regards

    nhb

    Comment


    • #17
      I didn't mean to create the thread that would not die ... .

      My general comment here is that the faster solutions are ones I understand more poorly <grin>, which I suppose indicates use of features of Mata not found in other languages I know.

      As it happens, for the applications I'm interested in, the rows (N) are persons, and the columns (K) are a list of questionnaire responses, so K << N is a good assumption. But Neils points to another issue here, i.e. ,that perhaps N >> K should not necessarily be assumed. If there were to be a contribution of a new community-contributed Mata function here, perhaps it should check the size of N vs. K and choose the algorithm accordingly. When I compared speed, I only examined situations when N >>K.

      Comment


      • #18
        Just for curiosity, and to round out this thread, I did some timing experiments to compare all the codes/algorithms suggested here.
        N = # observations (10, 100, 500, 1000); K = #variables (10, 20, 100, 1000)

        I implemented each person's code as a Mata function. Timings were based on enough repetitions (different for different settings of N and K) in order to get a total time of a few seconds, so I've reported times per repetition. I've listed the results below in tabular form.

        It's hard to say "this is the best algorithm," as that varies somewhat based on the particular combination of N and K. Daniel Klein's quite short code was the winner up until very large values of N and K, when Sergio Correia's became faster. There are a few odd spots where a generally slower code (e.g., my own) is slightly faster than some other. Timings are missing for several N,K values with John Mullahy's code, as those large settings caused my machine to freeze (memory issue??), so I had to skip them. (They didn't seem to bother John's machine.)

        The timings for a given algorithm do *not* seem to go up as N2, which is strange to me, since the problem inherently involves N(N-1)/2 comparisons. And, they don't seem to go up linearly with K. Perhaps there's a general lesson here, namely that a mathematical analysis of the size of a problem may well not reveal how Mata timings will work. I'd presume this is because time spent interpreting the byte code must in some cases be more costly than the comparisons or assignments being made. Other comments or explanations welcome.

        Code:
        name = name of person contributing the algorithm
        msrep = milliseconds per repetition
        (Sorted by increasing msrep w/in. N, K)
        
             +----------------------------------+
             |    name      N      K      msrep |
             |----------------------------------|
          1. |   Klein     10     10       0.04 |
          2. | Mullahy     10     10       0.05 |
          3. |   Bruun     10     10       0.05 |
          4. |    Lacy     10     10       0.05 |
          5. | Correia     10     10       0.29 |
             |----------------------------------|
          6. |   Klein     10     20       0.06 |
          7. | Mullahy     10     20       0.06 |
          8. |   Bruun     10     20       0.07 |
          9. |    Lacy     10     20       0.10 |
         10. | Correia     10     20       0.41 |
             |----------------------------------|
         11. |   Klein     10    100       0.09 |
         12. |    Lacy     10    100       0.11 |
         13. | Mullahy     10    100       0.15 |
         14. |   Bruun     10    100       0.24 |
         15. | Correia     10    100       1.30 |
             |----------------------------------|
         16. |   Klein     10   1000       0.47 |
         17. |    Lacy     10   1000       0.54 |
         18. |   Bruun     10   1000       2.09 |
         19. | Mullahy     10   1000       2.17 |
         20. | Correia     10   1000      11.46 |
             |----------------------------------|
         21. |   Klein    100     10       0.85 |
         22. |   Bruun    100     10       0.93 |
         23. | Correia    100     10       1.08 |
         24. | Mullahy    100     10       2.38 |
         25. |    Lacy    100     10       3.09 |
             |----------------------------------|
         26. |   Klein    100     20       1.58 |
         27. |   Bruun    100     20       1.80 |
         28. | Correia    100     20       1.91 |
         29. | Mullahy    100     20       6.79 |
         30. |    Lacy    100     20       6.88 |
             |----------------------------------|
         31. |   Klein    100    100       4.78 |
         32. |    Lacy    100    100       8.00 |
         33. | Correia    100    100       8.15 |
         34. |   Bruun    100    100       8.80 |
         35. | Mullahy    100    100      24.45 |
             |----------------------------------|
         36. |   Klein    100   1000      34.06 |
         37. |    Lacy    100   1000      56.25 |
         38. | Correia    100   1000      80.31 |
         39. |   Bruun    100   1000      87.97 |
         40. | Mullahy    100   1000     215.63 |
             |----------------------------------|
         41. | Correia    500     10      11.08 |
         42. |   Klein    500     10      19.69 |
         43. |   Bruun    500     10      29.92 |
         44. |    Lacy    500     10      77.69 |
         45. | Mullahy    500     10          . |
             |----------------------------------|
         46. | Correia    500     20      18.11 |
         47. |   Klein    500     20      32.61 |
         48. |   Bruun    500     20      58.33 |
         49. |    Lacy    500     20     171.67 |
         50. | Mullahy    500     20          . |
             |----------------------------------|
         51. | Correia    500    100      75.25 |
         52. |   Klein    500    100      97.50 |
         53. |    Lacy    500    100     202.50 |
         54. |   Bruun    500    100     286.25 |
         55. | Mullahy    500    100          . |
             |----------------------------------|
         56. | Correia    500   1000     732.50 |
         57. |    Lacy    500   1000    1425.00 |
         58. |   Klein    500   1000    1442.50 |
         59. |   Bruun    500   1000    2875.00 |
         60. | Mullahy    500   1000          . |
             |----------------------------------|
         61. | Correia   1000     10      44.25 |
         62. |   Klein   1000     10      67.75 |
         63. |   Bruun   1000     10     131.25 |
         64. |    Lacy   1000     10     310.00 |
         65. | Mullahy   1000     10          . |
             |----------------------------------|
         66. | Correia   1000     20      76.50 |
         67. |   Klein   1000     20     116.33 |
         68. |   Bruun   1000     20     258.33 |
         69. |    Lacy   1000     20     683.33 |
         70. | Mullahy   1000     20          . |
             |----------------------------------|
         71. | Correia   1000    100     343.50 |
         72. |   Klein   1000    100     373.00 |
         73. |    Lacy   1000    100     810.00 |
         74. |   Bruun   1000    100    1265.00 |
         75. | Mullahy   1000    100          . |
             |----------------------------------|
         76. | Correia   1000   1000    3250.00 |
         77. |    Lacy   1000   1000    5750.00 |
         78. |   Klein   1000   1000    6000.00 |
         79. |   Bruun   1000   1000   12800.00 |
         80. | Mullahy   1000   1000          . |
             +----------------------------------+

        Comment


        • #19
          The timings for a given algorithm do *not* seem to go up as N2, which is strange to me, since the problem inherently involves N(N-1)/2 comparisons. And, they don't seem to go up linearly with K. Perhaps there's a general lesson here, namely that a mathematical analysis of the size of a problem may well not reveal how Mata timings will work. I'd presume this is because time spent interpreting the byte code must in some cases be more costly than the comparisons or assignments being made. Other comments or explanations welcome.
          There's quite a few possibilities. Even with compiled languages (C/C++), things might not scale linearly due to e.g. how modern processors work. For instance, larger matrices might benefit from how data gets loaded from memory (in chunks or "pages"), from parallelization within the same CPU, etc.

          Whenever I find a surprising way of speeding up Mata, I tend to write it down here (mostly so I can re-read it a few months later the next time I have to code in Mata). Some things are easy to understand (example 1) but then others are a bit arcane. For instance, the three functions below are equivalent to computing sum(x), but take different amounts of time. The simple approach f_for takes 0.45s on my side, the second approach f_while takes 0.36, and the third approach that does something called "loop unrolling" only takes 0.30s:

          Code:
          clear all
          cls
          
          mata:
          
          real f_for(x) {
              y = 0
              n = rows(x)
              for (i=1;i<=n;i++) {
                  y = y + x[i]
              }
              return(y)
          }
          
          real f_while(x) {
              y = 0
              i = rows(x) + 1
              while (--i) {
                  y = y + x[i]
              }
              return(y)
          }
          
          real f_unroll(x) {
              y = 0
              i = rows(x) + 1
              while (i>3) {
                  y = y + x[--i] + x[--i] + x[--i] + x[--i]
              }
              return(y)
          }
          
          end
          
          mata: x = runiform(5e6, 1)
          set rmsg on
          mata: f_for(x)
          mata: f_while(x)
          mata: f_unroll(x)
          set rmsg off
          (Interestingly, note that sum(x) takes only 0.01 seconds, so the overhead of Mata vs a carefully built C functions is 30x!)

          Comment


          • #20
            Thanks, Sergio; this is very enlightening. Without having any knowledge about compiler design, the differences between the performance of your several Stata functions strike me as something of a deficit of the Mata compiler. I think a compiler should make an efficient translation. when the user chooses the most straightforward looping construct in the language. I'd say that some optimizing is in order, but that's just my opinion as an ordinary user. Now, I'd admit that loop overhead is often not the limiting factor in performance, but what we have seen in this thread shows a meaningful example where it matters a lot.

            Getting a comparison of some Python code here would be interesting, I'd say.

            Comment

            Working...
            X