Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • xthdidregress is very slow on large data sets

    I generated a panel data set with N = 1,000,000 and T = 30, with a staggered intervention starting at t = 28 and two control variables. Without the unit and year fixed effects, there are 56 regressors. This is exactly the setting that xthdidregress twfe was designed to handle (based on my work on staggered interventions). When I tried various estimation commands, xtreg with the fe option ran in 6.4 minutes, user-written command reghdfe in 89.2 minutes, and I had to stop the program after 5.5 hours for xthdidregress twfe.

    These large discrepancies are surprising to me, as one is simply using the within estimator with time dummies and 56 explanatory variables. Cleary 6.4 minutes is acceptable for such a large data set. The run time with xthdidregress twfe makes it almost unusable for large problems.

    I'm using Stata 18 SE.

  • #2
    Hi Jeff,

    We will investigate this and report back.

    Thanks

    Comment


    • #3
      Hey Enrique Pinzon (StataCorp), is this something that was ever explored further? I am working with a very large data set and I have had to stop the command after 10+ hours each time.

      Comment


      • #4
        Hi Tim.

        We did. In the February 26 of 2025 update we modified the computation. Before we were using -regress- and then calling -margins- to compute standard errors that accounted for variability in covariates, i.e. vce(unconditional). Now we call regress and then we compute standard errors directly, instead of calling -margins-. It might just be the dimensionality of your problem. In particular, underneath the hood we are generating a set of interactions that might make the problem challenging.

        Best,

        Enrique

        Comment

        Working...
        X