Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Speed with in or if condition

    If I run
    Code:
    set maxvar 11000
    use https://www.stata-press.com/data/r16/nlswork
    xtreg ln_wage  age c.age#i.idcode in 1/100,fe
    it takes several minutes. When I cut the sample first

    Code:
    set maxvar 11000
    use https://www.stata-press.com/data/r16/nlswork
    keep in 1/100
    xtreg ln_wage  age c.age#i.idcode ,fe
    it runs very quickly. Does anyone know why?

    Phil

  • #2
    I tried to explore what was happening with a tracedepth of 2 and using the code profiler, and viewing the source of xtreg. It seems like much of the preparatory work of -xtreg- does not account for the in/if conditions. The largest slowdown appeared to occur when -_rmcoll- is used to expand all variables to identify those that can be omitted for collinearity. In your example dataset, there are >4700 such variables to create, expand and check because of the number of panels present. This is computationally very wasteful since only a small fraction will be used by virtue of your if/in condition.

    One way to see this is to set all panel IDs to the same value beyond your in condition:

    Code:
    replace idcode = idcode[100]+1 if _n > 100
    Then these two estimations are both about as quick.

    Code:
    xtreg ln_wage  age c.age#i.idcode in 1/100 ,fe
    xtreg ln_wage  age c.age#i.idcode ,fe

    Comment


    • #3
      Excellent. Thank you!
      Phil

      Comment

      Working...
      X