Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Working with large Panels (regressions)

    Hi,

    I have a large wide panel dataset: t=804 months. N=45464 individuals. In long form, this equals to 804*45464=36,553,056 observations. I am wondering which of the following procedures in, in general, faster:
    1. Transform the full wide dataset into one large long dataset. Run regressions on sub samples. Perhaps: drop not needed observations before running regressions on sub samples.
    2. Transform only sub sample into long dataset. Run analysis on sub sample. Consider new sub sample.
    What is, in general, faster?

    Thanks so much!

  • #2
    The -reshape- command, although one of Stata's best features IMHO, is also one of its slowest commands. It has a lot of overhead verifying that the data are suitable for what is requested, and it also typically thrashes the disk a lot. So you probably only want to do it once, for the whole data set, rather than repeatedly.

    Once you have transformed the entire data set into long, if your subsamples are mutually exclusive, as is often the case, and indicated (or or could be indicated) by a variable or group of variables, then you can run the regressions fairly quickly after that. If you do not need to store regression outputs and just need the results in your analysis log, then the -by:- prefix will very quickly take you through all of them. If after each regression you need to store or do something else with the regression results, then you can wrap the regression and its post-processing into a little program and do it with -runby- (by Robert Picard and me, available from SSC).

    Comment


    • #3
      Thanks a lot. This is very helpful!

      Comment

      Working...
      X