Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Speed up ivregress

    Hi Statalist,

    I am running an IV regression that interacts the endogenous variable with year and industry fixed effects with about 3 million observations and it takes about 5, 6 hours or even longer. I would like to know how to speed this up because I need to bootstrap the regression to generate a standard error.

    The regression in Stata command is as follow: ivregress 2sls y i.year i.industry some_controls (x c.x#i.industry#i.year = z c.z#i.industry#i.year)

    There are about 300 industries and 15 years. So maybe because there are so many parameters to estimate, the model takes a long time to run. I tried to run this in Stata-MP but for some reason, there is not much speed gain. I am not sure what else I can do.

    Thank you,
    Jeff

  • #2
    The way how you have posed the problem, I do not see anything you can do to speed this up. What you found by trying Stata MP (no speed gains) just reflects the fact that the way how you have posed the problem, the problem is not parallelisable.

    You might want to try and explain what you are trying to do there.

    Comment


    • #3
      If you do this:
      Code:
      contract y year industry allxvars allzvars
      does it reduce the sample size very much?

      If so then after –contract– you could try this:
      Code:
      ivregress 2sls y i.year i.industry some_controls (x c.x#i.industry#i.year = z c.z#i.industry#i.year) [fw=_freq]
      My guess is that it's the number of parameters rather than the number of observations that's the issue, but this may be worth a try just in case not.

      Comment


      • #4
        Thank you for the suggestion.

        Contract did not work for me, but I was not aware of this before, clever idea. Thank you.

        Comment


        • #5
          Originally posted by Joro Kolev View Post
          The way how you have posed the problem, I do not see anything you can do to speed this up. What you found by trying Stata MP (no speed gains) just reflects the fact that the way how you have posed the problem, the problem is not parallelisable.

          You might want to try and explain what you are trying to do there.
          Thank you. I guess that's true. Looking at the MP report, I thought ivregress could be parallel effectively, but maybe the issue is my particular problem.

          Comment


          • #6
            Jeff, what is the nature of the x variable? Is it continuous? Binary? Something else? In the continuous and binary cases I can suggest a control function approach which will differ (hopefully not by a log) from 2SLS but it will run much more quickly. It will be two OLS regressions rather than the many, many first stages implicit in ivregress.

            Comment


            • #7
              Is -regfdhe- from SSC (https://www.stata.com/meeting/chicag...16_correia.pdf) not suitable here?

              Comment


              • #8
                Originally posted by [email protected] View Post
                Is -regfdhe- from SSC (https://www.stata.com/meeting/chicag...16_correia.pdf) not suitable here?
                My initial thought is that this would solve it, but interacting the endogenous variable with lots of fixed effects makes the problem harder. You can't simply absorb those interaction terms the way you can if the fixed effects were only additive.
                Last edited by Jeff Wooldridge; 09 Jun 2021, 07:21.

                Comment


                • #9
                  Thank you Jeff, x is continuous. Yes, CF makes a lot of sense, will try that. It should save me weeks of waiting.

                  I tried ivregress instead of ivreghdfe because I was hoping to use stata-MP, not sure if ivreghdfe could be paralleled as much.

                  Comment


                  • #10
                    ivregress 2sls y i.year i.industry some_controls (x c.x#i.industry#i.year = z c.z#i.industry#i.year)
                    With ivreghdfe from SSC, you can absorb the highlighted indicators and any controls that you are not explicitly interested in and perhaps gain some efficiency.


                    Code:
                    ivreghdfe y some_controls (x c.x#i.industry#i.year = z c.z#i.industry#i.year), absorb(industry year)
                    e.g.,

                    Code:
                    webuse nlswork, clear
                    ivreghdfe ln_w i.year age c.age#c.age not_smsa  (c.tenure#i.year = c.union#i.year c.south#i.year), abs(idcode)
                    ivreghdfe ln_w age c.age#c.age (c.tenure#i.year = c.union#i.year c.south#i.year), abs(idcode i.year not_smsa)

                    Comment


                    • #11
                      Originally posted by Andrew Musau View Post

                      With ivreghdfe from SSC, you can absorb the highlighted indicators and any controls that you are not explicitly interested in and perhaps gain some efficiency.


                      Code:
                      ivreghdfe y some_controls (x c.x#i.industry#i.year = z c.z#i.industry#i.year), absorb(industry year)
                      e.g.,

                      Code:
                      webuse nlswork, clear
                      ivreghdfe ln_w i.year age c.age#c.age not_smsa (c.tenure#i.year = c.union#i.year c.south#i.year), abs(idcode)
                      ivreghdfe ln_w age c.age#c.age (c.tenure#i.year = c.union#i.year c.south#i.year), abs(idcode i.year not_smsa)

                      Thank you for the suggestion, I tried that and the code is running. However, I think there will be minimal gains. The problem is that there are so many first-stages.

                      Comment

                      Working...
                      X