Speed up ivregress

Jeff Feder

Join Date: Jun 2021

Posts: 5
#1

Speed up ivregress

03 Jun 2021, 20:10

Hi Statalist,

I am running an IV regression that interacts the endogenous variable with year and industry fixed effects with about 3 million observations and it takes about 5, 6 hours or even longer. I would like to know how to speed this up because I need to bootstrap the regression to generate a standard error.

The regression in Stata command is as follow: ivregress 2sls y i.year i.industry some_controls (x c.x#i.industry#i.year = z c.z#i.industry#i.year)

There are about 300 industries and 15 years. So maybe because there are so many parameters to estimate, the model takes a long time to run. I tried to run this in Stata-MP but for some reason, there is not much speed gain. I am not sure what else I can do.

Thank you,
Jeff
Tags: None
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#2

04 Jun 2021, 01:30

The way how you have posed the problem, I do not see anything you can do to speed this up. What you found by trying Stata MP (no speed gains) just reflects the fact that the way how you have posed the problem, the problem is not parallelisable.

You might want to try and explain what you are trying to do there.
Comment
John Mullahy

Join Date: Dec 2016

Posts: 752
#3

04 Jun 2021, 08:24

If you do this:

Code:

contract y year industry allxvars allzvars

does it reduce the sample size very much?

If so then after –contract– you could try this:

Code:

ivregress 2sls y i.year i.industry some_controls (x c.x#i.industry#i.year = z c.z#i.industry#i.year) [fw=_freq]

My guess is that it's the number of parameters rather than the number of observations that's the issue, but this may be worth a try just in case not.
Comment
Jeff Feder

Join Date: Jun 2021

Posts: 5
#4

07 Jun 2021, 20:25

Thank you for the suggestion.

Contract did not work for me, but I was not aware of this before, clever idea. Thank you.
Comment
Jeff Feder

Join Date: Jun 2021

Posts: 5
#5

07 Jun 2021, 20:29

Originally posted by Joro Kolev View Post

The way how you have posed the problem, I do not see anything you can do to speed this up. What you found by trying Stata MP (no speed gains) just reflects the fact that the way how you have posed the problem, the problem is not parallelisable.

You might want to try and explain what you are trying to do there.

Thank you. I guess that's true. Looking at the MP report, I thought ivregress could be parallel effectively, but maybe the issue is my particular problem.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2188
#6

08 Jun 2021, 08:56

Jeff, what is the nature of the x variable? Is it continuous? Binary? Something else? In the continuous and binary cases I can suggest a control function approach which will differ (hopefully not by a log) from 2SLS but it will run much more quickly. It will be two OLS regressions rather than the many, many first stages implicit in ivregress.
Comment
Daniel Feenberg

Join Date: Oct 2014

Posts: 328
#7

09 Jun 2021, 05:17

Is -regfdhe- from SSC (https://www.stata.com/meeting/chicag...16_correia.pdf) not suitable here?
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2188
#8

09 Jun 2021, 07:19

Originally posted by [email protected] View Post

Is -regfdhe- from SSC (https://www.stata.com/meeting/chicag...16_correia.pdf) not suitable here?

My initial thought is that this would solve it, but interacting the endogenous variable with lots of fixed effects makes the problem harder. You can't simply absorb those interaction terms the way you can if the fixed effects were only additive.

Last edited by Jeff Wooldridge; 09 Jun 2021, 07:21.
Comment
Jeff Feder

Join Date: Jun 2021

Posts: 5
#9

09 Jun 2021, 14:57

Thank you Jeff, x is continuous. Yes, CF makes a lot of sense, will try that. It should save me weeks of waiting.

I tried ivregress instead of ivreghdfe because I was hoping to use stata-MP, not sure if ivreghdfe could be paralleled as much.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10254
#10

09 Jun 2021, 15:31

ivregress 2sls y i.year i.industry some_controls (x c.x#i.industry#i.year = z c.z#i.industry#i.year)

With ivreghdfe from SSC, you can absorb the highlighted indicators and any controls that you are not explicitly interested in and perhaps gain some efficiency.

Code:

ivreghdfe y some_controls (x c.x#i.industry#i.year = z c.z#i.industry#i.year), absorb(industry year)

e.g.,

Code:

webuse nlswork, clear ivreghdfe ln_w i.year age c.age#c.age not_smsa (c.tenure#i.year = c.union#i.year c.south#i.year), abs(idcode) ivreghdfe ln_w age c.age#c.age (c.tenure#i.year = c.union#i.year c.south#i.year), abs(idcode i.year not_smsa)
Comment
Jeff Feder

Join Date: Jun 2021

Posts: 5
#11

09 Jun 2021, 18:24

Originally posted by Andrew Musau View Post

With ivreghdfe from SSC, you can absorb the highlighted indicators and any controls that you are not explicitly interested in and perhaps gain some efficiency.

Code:

ivreghdfe y some_controls (x c.x#i.industry#i.year = z c.z#i.industry#i.year), absorb(industry year)

e.g.,

Code:

webuse nlswork, clear ivreghdfe ln_w i.year age c.age#c.age not_smsa (c.tenure#i.year = c.union#i.year c.south#i.year), abs(idcode) ivreghdfe ln_w age c.age#c.age (c.tenure#i.year = c.union#i.year c.south#i.year), abs(idcode i.year not_smsa)

Thank you for the suggestion, I tried that and the code is running. However, I think there will be minimal gains. The problem is that there are so many first-stages.
Comment

Announcement

Speed up ivregress

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment