Summary of problem:
I am attempting to run dynamic panel estimation on a large dataset using xtabond2. The procedure completes without error when running on a 1/4 subsample of the data, but when I attempt a larger subsample Stata will exit (Stata closes) without warning or error code part way through the run (after ~4 hours).
Background information on computer and dataset:
Computer: Windows Server 2019, StataMP 15.1, 2x 16-core Xeon CPUs, 768gb RAM
Data: 12 years, ~3,000 companies per trading day (~252 days per year), 39 intraday observations per trading day, which gives me 7,640,370 firm/day panels with a 39 period time series
The code and error:
I recognize that the size of the sample and the instrument matrix will make this analysis very memory intensive and means I will most likely be unable to analyze the entire sample at once on this machine. However, I need to run it on as large a subsample as possible (ideally 6 year subsamples). Below are the relevant parts of the code I am running:
I have run this consistently successfully on small subsamples (1, 2, and 3 years) many times in an attempt to identify appropriate lags for both the model and instruments. However, when I use a 4-year subsample (~2.2 million panels, 39 time series), it will run for ~4 hours before exiting without warning or providing an error code.
Attempted troubleshooting:
It does not appear to be a memory limit. I have logged memory usage and the machine does not exceed 420gb used at any time. I have tried several alternative options to reduce memory usage (i.e. mata set matafavor space, principle components for instruments) to no avail. I have tried manually setting max_memory above this usage threshold (instead of .) and that does not help. I was initially concerned it was a physical problem with the memory, but I can run two concurrent programs using 3-year subsamples and ~600gb+ memory without problem. I additionally ran memtest on the machine and it returned no errors. I have turned trace on and gone through the log, but can find no reason for the failure (or, at least, I do not sufficiently understand the mata log output to recognize what it is doing when it fails).
I have tried reinstalling and verifying with the Stata Installation Qualification Tool and the problem persists. I've spent a lot of time searching Statalist and struggle to find any similar errors (that haven't been explained and fixed).
The only potentially limiting factor I can currently find is the maximum matsize for Stata 15.1. However, as xtabond2 relies on Mata I don't think this should be a limiting factor. I am not opposed to upgrading to Stata 18 if it is something that has been addressed (i.e. increased matsize). However, my current license is 64-core MP, so upgrading is a costly solution.
I have used Stata for 12+ years and have never had it exit without warning or error code, let alone in such a predictable manner. If anyone has run into a similar problem, understands what is happening, or has suggestions for additional troubleshooting I would greatly appreciate any advice.
I am attempting to run dynamic panel estimation on a large dataset using xtabond2. The procedure completes without error when running on a 1/4 subsample of the data, but when I attempt a larger subsample Stata will exit (Stata closes) without warning or error code part way through the run (after ~4 hours).
Background information on computer and dataset:
Computer: Windows Server 2019, StataMP 15.1, 2x 16-core Xeon CPUs, 768gb RAM
Data: 12 years, ~3,000 companies per trading day (~252 days per year), 39 intraday observations per trading day, which gives me 7,640,370 firm/day panels with a 39 period time series
The code and error:
I recognize that the size of the sample and the instrument matrix will make this analysis very memory intensive and means I will most likely be unable to analyze the entire sample at once on this machine. However, I need to run it on as large a subsample as possible (ideally 6 year subsamples). Below are the relevant parts of the code I am running:
Code:
set matsize 11000 mata: mata set matafavor speed use sample.dta xtset firmdateindex timeindex xi: xtabond2 y l(1/4).y l(0/3).(x1 x2 x3) x4 i.timeindex /// gmm(l(1/4).y, lag(5 8) collapse) gmm(l(0/3).(x1 x2 x3), lag(4 5) collapse) /// iv(x4 i.timeindex, eq(level)) twostep robust
Attempted troubleshooting:
It does not appear to be a memory limit. I have logged memory usage and the machine does not exceed 420gb used at any time. I have tried several alternative options to reduce memory usage (i.e. mata set matafavor space, principle components for instruments) to no avail. I have tried manually setting max_memory above this usage threshold (instead of .) and that does not help. I was initially concerned it was a physical problem with the memory, but I can run two concurrent programs using 3-year subsamples and ~600gb+ memory without problem. I additionally ran memtest on the machine and it returned no errors. I have turned trace on and gone through the log, but can find no reason for the failure (or, at least, I do not sufficiently understand the mata log output to recognize what it is doing when it fails).
I have tried reinstalling and verifying with the Stata Installation Qualification Tool and the problem persists. I've spent a lot of time searching Statalist and struggle to find any similar errors (that haven't been explained and fixed).
The only potentially limiting factor I can currently find is the maximum matsize for Stata 15.1. However, as xtabond2 relies on Mata I don't think this should be a limiting factor. I am not opposed to upgrading to Stata 18 if it is something that has been addressed (i.e. increased matsize). However, my current license is 64-core MP, so upgrading is a costly solution.
I have used Stata for 12+ years and have never had it exit without warning or error code, let alone in such a predictable manner. If anyone has run into a similar problem, understands what is happening, or has suggestions for additional troubleshooting I would greatly appreciate any advice.
Comment