Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by charlie wong View Post

    Thanks Jesse. I used your code and get:

    . timer list
    1: 0.14 / 1 = 0.1400
    2: 7.03 / 1 = 7.0310
    3: 0.07 / 1 = 0.0680

    Sorry I am not quite following you - is it that the 7.03 vs 0.07 here sheds light on the time required for calculating skew and mean?
    Your computer is considerably faster than mine, that's one conclusion :D . But I realise now the comparison is not entirely correct, it should be
    Code:
    clear all
    set obs 10000000
    gen x = rnormal()
    timeit 1: gen sum = sum(x)
    timeit 2: sum x, d
    timeit 3: sum x, meanonly
    timer list
    . timer list
    1: 0.52 / 1 = 0.5220
    2: 14.73 / 1 = 14.7260
    3: 0.11 / 1 = 0.1050

    If we compare 1 and 2, we see that sum,d takes a lot longer than sum(x), which are the commands used respectively by skew() and mean/sd() and explains why performance for mean/sd was still fine, while skew was not. Furthermore, if we compare 1 and 3, we see that using sum x instead of sum(x) could potentially speed up mean/sd by a factor 5 still. Of course, this ignores some details but the contrast might actually be starker still ...

    Some might say I'm obsessed with speedtests (I may or may not have a folder on my pc with a bunch of different speed comparisons...). Did you know that using egen tag = tag() followed by drop if tag == 0 is considerably faster than duplicates drop, force (~20-50%) (at least, in my stylised test). Or that drop <varlist> (multiple vars) is massively faster than a succession of drop <varname> (single var at a time)? Well, now you do.

    Comment


    • #17
      Originally posted by Jesse Wursten View Post

      Your computer is considerably faster than mine, that's one conclusion :D . But I realise now the comparison is not entirely correct, it should be
      Code:
      clear all
      set obs 10000000
      gen x = rnormal()
      timeit 1: gen sum = sum(x)
      timeit 2: sum x, d
      timeit 3: sum x, meanonly
      timer list
      . timer list
      1: 0.52 / 1 = 0.5220
      2: 14.73 / 1 = 14.7260
      3: 0.11 / 1 = 0.1050

      If we compare 1 and 2, we see that sum,d takes a lot longer than sum(x), which are the commands used respectively by skew() and mean/sd() and explains why performance for mean/sd was still fine, while skew was not. Furthermore, if we compare 1 and 3, we see that using sum x instead of sum(x) could potentially speed up mean/sd by a factor 5 still. Of course, this ignores some details but the contrast might actually be starker still ...

      Some might say I'm obsessed with speedtests (I may or may not have a folder on my pc with a bunch of different speed comparisons...). Did you know that using egen tag = tag() followed by drop if tag == 0 is considerably faster than duplicates drop, force (~20-50%) (at least, in my stylised test). Or that drop <varlist> (multiple vars) is massively faster than a succession of drop <varname> (single var at a time)? Well, now you do.
      speed is king ...now that i m dealing with a huge dataset...thanks so much for sharing the tips on speeding!

      Comment


      • #18
        Riffs on speed tests in other problems aside, as his co-author I would like to underline Robert Picard's point in #14 that rangestat already provides faster code.

        Anyone following the forum closely may wonder why I didn't make this point myself in #8. The answer lies in an otherwise uninteresting cautionary tale. I tried some speed tests with rangestat and was surprised not to see a massive speed-up, so left the point on one side. Only later did it become clear that there were other problems on my machine and/or local network which were responsible for the slowdown. So, as everyone tells you, apparent speeds depend on your computer and what else it is doing or not doing.

        Comment


        • #19
          Thank you Robert

          Comment

          Working...
          X