Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • New command -reghdfe- available on SSC

    Dear all,

    A new package, reghdfe, is now available from download from SSC.

    It performs linear and instrumental variable regressions while absorbing for any number of fixed effects. reghdfe builts heavily on the packages reg2hdfe by Paulo Guimaraes and a2reg by Amine Ouazad. Details and examples are included in the help file, but key features include:
    • Much faster than the alternatives (reg2hdfe, a2reg, ivreg2hdfe, felsdvreg, etc) in most scenarios. It's built in Mata and avoids some of the usual bottlenecks such as sorting the data every iteration or large memory consumption.
    • Allows more than two sets of highly dimensional fixed effects (HDFE), using the same absorb() syntax as areg.
    • Allows interactions of fixed effects: absorb(industry#year)
    • Allows absorbing for interactions with categorical variables. For instance, absorb(i.industry##c.t) will include industry fixed effects, and a different time trend for each industry.
    • Can run IV/2SLS regressions using either -ivregress- or -ivreg2- (if avaiable).
    • Allows factor variable and time series in the varlists.
    • In OLS regressions, it also reports FStats for the FEs (see option -nested-) as well as correlation between the fixed effects and xb.
    Many thanks to Kit Baum for his SSC help, and to both Paulo Guimaraes and Amine Ouazad for their invaluable feedback. Comments are welcome!

    Best,
    Sergio


  • #2
    Are you faster (probably at the cost of giving up some flexibility) than -xtreg- and/or -areg- for a single FE? E.g. for reasons similar to older lessons learnt: http://www.nber.org/stata/efficient/fixed-effects.html

    Comment


    • #3
      Hi Lazlo,

      Originally posted by László View Post
      Are you faster (probably at the cost of giving up some flexibility) than -xtreg- and/or -areg- for a single FE? E.g. for reasons similar to older lessons learnt: http://www.nber.org/stata/efficient/fixed-effects.html
      It's actually the opposite. In my experience, nothing beats -areg- for a single FE (not even -xtreg-). Where -reghdfe- excels is at anything above one FE, and at being more general (e.g. you can do absorb(mpg#rep) or absorb(mpg##c.price) ). That said, the speed is not too bad compared to areg, given the extra work involved.

      There is also a developer version that fixed a few minor bugs (see net from http://people.duke.edu/~sac45/stata ), as well speedups like support for using multiple cores (thanks to George G. Vega Yon). It's quite stable but not yet in SSC.

      Best,
      Sergio

      Comment


      • #4
        Thanks, Sergio!

        Actually, -_regress, absorb()- can be twice as fast as -areg-, though surely there is a downside to being less generic, less flexible.

        The MP support sounds very useful — though I am a bit disheartened if Mata does not support it automatically, I thought that was one of their selling points. In any case, see also my post on another thread: Go to post

        Comment


        • #5
          Originally posted by László View Post
          Actually, -_regress, absorb()- can be twice as fast as -areg-, though surely there is a downside to being less generic, less flexible.
          Yep, that's true, I recall -areg- actually calls _regress, but then it also does some useful work before that.

          Originally posted by László View Post
          The MP support sounds very useful — though I am a bit disheartened if Mata does not support it automatically, I thought that was one of their selling points. In any case, see also my post on another thread: Go to post
          The Frisch-Waugh transformation that -reghdfe- does is easy to parallelize because computations are independent for each variable. That said, I do dislike having to open multiple instances of Stata, and parallel support would be great if built directly into Mata (however, dealing with multithreading, semaphores, and shared memory is *really* hard IMHO).

          Cheers,
          S

          Comment


          • #6
            You know what you are doing. But I never assumed you call multiple instances. Wow. So the MP tools built into Mata and Stata don't kick in otherwise. Shame.

            Whatever -areg- does in parsing fvvarlists or tsvarlists, or a generic vce option (also allowing for a bootstrap wrapper etc.), or posting more in ereturn, I still don't understand why that would justify an O(n) slowdown, not o(n). Meaning that the performance hit is substantial, if not more, on big data, even on a plain vanilla call as in my test posted in the other thread. In any case, lesson learnt, call _regress when you can, even with absorb().

            Comment


            • #7
              Just a quick follow up.. I did a quick benchmark and the difference between areg and _regress seems mostly b/c areg calls _regress and then calls -regress- in order to get the R2 of the model without absvars (and thus estimate the F-Stat of the absorbed variables). If there is only one indepvar then the time difference is trivial, but increases with the number of variables in the RHS.

              All in all, the slowdown of areg seems justified, but having a -fast- option that avoids reporting that F-stat would be useful (that's what I did in -reghdfe-, as it sometimes made the difference between a 2hr regression and a 1hr regression)

              Comment


              • #8
                Hi,

                I know reghdfe is already fast but I was wondering if there was a way to speed up reghdfe by combining the reghdfe command with the parallel command (e.g. parallel : reghdfe ...). Is that possible?

                Thanks

                Olivier

                Comment


                • #9
                  Hi Olivier,

                  There is actually some parallelization going on:

                  - The old version of reghdfe (see "help reghdfe_old") had a "cores(#)" option that opened multiple instances of Stata (and that requires parallel.ado from ssc). However, I had some difficulties with this approach (hard to implement on a Linux server, plus a high memory overhead), so I replaced it in the last SSC update.

                  - The new version of reghdfe takes a different approach and demeans multiple variables at the same time. It takes advantage of some built-in parallel Mata functions, but not everything is parallel. You can tweak these settings with the "poolsize(#)" option, which is explained in more detail in the help file.


                  Best,
                  Sergio

                  Comment


                  • #10
                    Thanks a lot Sergio

                    Best

                    Olivier

                    Comment


                    • #11
                      Hi László, Sergio and Olivier.

                      SInce you're familiar with ways to speed up fixed effects regressions in Stata, I was wondering if you could provide some useful comments on this post. I'm basically trying to figure out what's the best way to go when you have big data, fixed effects, clustering and weights.

                      Regards, Jorge Pérez.
                      Jorge Eduardo Pérez Pérez
                      www.jorgeperezperez.com

                      Comment


                      • #12
                        Hi,

                        Thanks for making reghdfe! This command is amazing! I'm having trouble using reghdfe to output multiple forms of the regression. For example, when I run

                        reghdfe price (mpg = rep78), absorb(foreign) stages(first reduced ols)

                        I see all four regressions displayed. I would like to save all of these estimates, but I can't.

                        I am able to use

                        estimates replay reghdfe_first1

                        to show the first stage regression, but whenever I try to use eststo or estimates store to save it, it shows me the IV coefficients.

                        For example,

                        eststo reghdfe_first1
                        esttab reghdfe_first1

                        Shows me the IV output, not the first stage. Do you know how to save the output for all of the four regressions?

                        Comment

                        Working...
                        X