Speed with in or if condition

Phil Bromiley

09 Jul 2019, 12:11

If I run

Code:

set maxvar 11000
use https://www.stata-press.com/data/r16/nlswork
xtreg ln_wage  age c.age#i.idcode in 1/100,fe

it takes several minutes. When I cut the sample first

Code:

set maxvar 11000
use https://www.stata-press.com/data/r16/nlswork
keep in 1/100
xtreg ln_wage  age c.age#i.idcode ,fe

it runs very quickly. Does anyone know why?

Phil

Tags: None

Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2403
#2

09 Jul 2019, 13:18

I tried to explore what was happening with a tracedepth of 2 and using the code profiler, and viewing the source of xtreg. It seems like much of the preparatory work of -xtreg- does not account for the in/if conditions. The largest slowdown appeared to occur when -_rmcoll- is used to expand all variables to identify those that can be omitted for collinearity. In your example dataset, there are >4700 such variables to create, expand and check because of the number of panels present. This is computationally very wasteful since only a small fraction will be used by virtue of your if/in condition.

One way to see this is to set all panel IDs to the same value beyond your in condition:

Code:

replace idcode = idcode[100]+1 if _n > 100

Then these two estimations are both about as quick.

Code:

xtreg ln_wage age c.age#i.idcode in 1/100 ,fe xtreg ln_wage age c.age#i.idcode ,fe
2 likes
Comment
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#3

10 Jul 2019, 11:33

Excellent. Thank you!
Phil
Comment

Announcement