CMP mata error 3900

Melody Brown

Join Date: May 2022

Posts: 81
#1

CMP mata error 3900

08 Jul 2022, 14:08

Hi all,

I was trying to use State to run a CMP model with random coefficient. And I got the error below:

Fitting constant-only model for LR test of overall model fit.
J(): 3900 unable to allocate real <tmp>[57194029448,10]
cmp_model::cmp_init(): - function returned error
<istmt>: - function returned error
Mata run-time error

I used slurm with 500gb memory but still failed. I think this is a memory issue and I wonder if anyone knows what I can do to this to get the model running.

Thanks a lot and look forward to your reply.
Tags: None
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

08 Jul 2022, 16:44

The error message certainly suggests to me that the model required a real matrix with 57,194,029,448 rows and 10 columns - to me that suggests over a terabyte of memory is required.
Comment
David Roodman

Join Date: Jul 2014

Posts: 479
#3

09 Jul 2022, 08:10

Exactly. 4 terabytes, because each entry in a Mata matrix is double precision, which takes 8 bytes. How many observations are in the regression? How many equations? Is the default of adaptive quadrature being accepted by not using the redraws() option? If so, is the intpoints() option used to control the number of quadrature points? (You could go as low as, say, 7.)

Last edited by David Roodman; 09 Jul 2022, 08:17.
1 like
Comment
Melody Brown

Join Date: May 2022

Posts: 81
#4

09 Jul 2022, 15:45

Originally posted by William Lisowski View Post

The error message certainly suggests to me that the model required a real matrix with 57,194,029,448 rows and 10 columns - to me that suggests over a terabyte of memory is required.

Hi Williams,

Thanks a lot for your suggestion. If this is the case, do you know if it's still possible to run the regression? For instance, if I use State MP with 16 cores or a computer with 64GB ram?

I have 5 simultaneous equations with around 20 variables. But one categorical variable has more than 3000 values and I have 40,000 obs.

Thank you and look forward to your reply.
Comment
Melody Brown

Join Date: May 2022

Posts: 81
#5

09 Jul 2022, 15:58

Originally posted by David Roodman View Post

Exactly. 4 terabytes, because each entry in a Mata matrix is double precision, which takes 8 bytes. How many observations are in the regression? How many equations? Is the default of adaptive quadrature being accepted by not using the redraws() option? If so, is the intpoints() option used to control the number of quadrature points? (You could go as low as, say, 7.)

Hi David,

Thanks a lot for your reply. I have 40,000 obs which I used to run regression on 5 simultaneous equations. I didn't use any options for now, just a combination of simultaneous equations and random coefficients on individual level (which is 10000). Could you please illustrate a little bit on the intpoints option and how would it tackle the problem?

Thanks and look forward to your reply.
Comment
Melody Brown

Join Date: May 2022

Posts: 81
#6

14 Jul 2022, 09:45

Originally posted by William Lisowski View Post

The error message certainly suggests to me that the model required a real matrix with 57,194,029,448 rows and 10 columns - to me that suggests over a terabyte of memory is required.

Hi Williams，

Thanks for your previous reply. I wonder if you know if it's still possible to run this in Stata or should I use other languages such as R or Python?

Thank you and look forward to your reply.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#7

14 Jul 2022, 10:20

I'm afraid I don't have any expertise in CMP models - just in guessing at the meaning of Mata error messages.
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#8

15 Jul 2022, 03:06

For how to fiddle with -cmp- read the help file.

If you try this in R or Python, do report on the success or lack of success thereof.

On the econometric side "one categorical variable has more than 3000 values " sounds very dodgy. Are these values ordered? Do the difference between these values have a meaning?

To give you a benchmark, in labour economics age and experience are treated as continuous variables, but both are pretty much between 16 and 90 in typical samples, this is 74 unique values.

If there is a way to treat your "one categorical variable has more than 3000 values " as a continuous variable, this would probably solve your problem.
1 like
Comment
Melody Brown

Join Date: May 2022

Posts: 81
#9

20 Jul 2022, 08:31

Originally posted by Joro Kolev View Post

For how to fiddle with -cmp- read the help file.

If you try this in R or Python, do report on the success or lack of success thereof.

On the econometric side "one categorical variable has more than 3000 values " sounds very dodgy. Are these values ordered? Do the difference between these values have a meaning?

To give you a benchmark, in labour economics age and experience are treated as continuous variables, but both are pretty much between 16 and 90 in typical samples, this is 74 unique values.

If there is a way to treat your "one categorical variable has more than 3000 values " as a continuous variable, this would probably solve your problem.

Hi Joro,

Thank you so much for your kind reply. I'll definitely try the continuous variable method and see if it works.
Comment
Melody Brown

Join Date: May 2022

Posts: 81
#10

20 Jul 2022, 18:29

Originally posted by William Lisowski View Post

I'm afraid I don't have any expertise in CMP models - just in guessing at the meaning of Mata error messages.

Hi William,

Got it! Thank you all the same
Comment

Announcement

CMP mata error 3900

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment