Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Code:
    sort t (id)
    is not the same as

    Code:
    sort id (t)
    The latter resembles the proposed Mata functions, the former does not.

    I recommend adding


    Code:
    assert bigtot1 == bigtot2
    assert bigtot1 == bigtot3
    [...]
    to the simulations, to make sure you are comparing the speed of commands/functions that actually give you the same results.

    Comment


    • #17
      Thanks daniel klein for pointing the error out! I was having my own code in mind where I calculate a mean for each time period. The corrected code is below.

      Finally, my second error was the assumption that the speed difference is due to the sort. It is not. With the update from 28. April 2020 Stata updated the egen functions:

      1. egen functions count(), kurt(), max(), mdev(), mean(), min(), pc(), sd(), skew(), and total() are faster in all cases. They are significantly faster when used with by varlist: and when there are missing values or an if expression. An extra sort is avoided in this case, and the time saved is equal to the time of this sort.
      This is - I believe - the reason for the speed improvement in comparison to older version. I tested the code with my Stata 15.1 version and egen is much slower than gen.

      Code:
      clear all
      
      mata:
          void coltot2() {
              id = st_data(.,"id")
              big = st_data(.,"var1")
              V = panelsetup(id, 1)
              bigtot = J(rows(id),1,.)
              for (i=rows(V); i; i--) {
                  X1 = panelsubmatrix(big, i, V)
                  bigtot[|V[i,1],.\V[i,2],.|]=J(rows(X1),1,colsum(X1))    
              }
              st_addvar("double", "bigtot3")
              st_store(., "bigtot3", bigtot)
          }
      
           void coltot3() {
              ID = st_data(.,"id") //just for ensuring that ID has no gap (1,2,3...)
              big = st_data(.,"var1")
              V = panelsetup(ID, 1)
              Xt = panelsum(big,V) //the undocumented panelsum function
              bigtot = Xt[ID,]
              st_addvar("double", "bigtot4")
              st_store(., "bigtot4", bigtot)
          }
      end
      
      mata results = J(0,8,.)
      
      qui{
          foreach i in 100 200 500 1000 10000  {
              foreach t in 100 200 500 1000 10000  {
                  clear
                  ** set seed in loop so drawn numbers are the same
                  set seed 123
                  timer clear
                  timer on 99
                  set obs `i'
                  gen id = _n
                  expand `t'
                  by id, sort: gen t = _n
      
                  drawnorm var1
                              
                  sort id t
                  
                  timer on 1
                  by id, sort: egen double bigtot1 = sum(var1)
                  timer off 1
      
                  timer on 2
                  by id, sort: gen double bigtot2=sum(var1)
                  by id, sort: replace bigtot2=bigtot2[_N]
                  timer off 2
                  
                  sort id t
                  
                  timer on 3
                  mata: coltot2()
                  timer off 3
                  
                  timer on 4
                  mata: coltot3()
                  timer off 4
                  
                  timer off 99
                  
                  
                  noi assert bigtot1 == bigtot2
                  noi assert bigtot1 == bigtot3
                  noi assert bigtot1 == bigtot4
                          
                  
                  noi disp "Results for `i', `t'"
                  noi timer list
                  mata results = results \ (`i',`t',`r(t1)',`r(t2)',`r(t3)',`r(t4)',`r(t99)')
              }
          }
      }
      
      clear
      getmata (res*)= results
      
      rename res1 id
      rename res2 t
      rename res3 egen
      rename res4 gen
      rename res5 coltot2
      rename res6 coltot3
      
      qui{
          foreach type in  egen gen coltot2 coltot3  {
              preserve
                  keep id t `type'
                  rename `type'* t_*
                  noi disp "`type'"
                  reshape wide t_ , i(id) j(t)
                  putmata res = (id t_*), replace
                  
                  mata res2 = ("N\T","100","200","500","1000","10000") \ (strofreal(res[.,1]) , strofreal(res[.,2..6],"%9.3f"))
                  
                  noi mata res2
              
              restore
              
          }
      }

      Comment

      Working...
      X