Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • CMP mata error 3900

    Hi all,

    I was trying to use State to run a CMP model with random coefficient. And I got the error below:

    Fitting constant-only model for LR test of overall model fit.
    J(): 3900 unable to allocate real <tmp>[57194029448,10]
    cmp_model::cmp_init(): - function returned error
    <istmt>: - function returned error
    Mata run-time error

    I used slurm with 500gb memory but still failed. I think this is a memory issue and I wonder if anyone knows what I can do to this to get the model running.

    Thanks a lot and look forward to your reply.

  • #2
    The error message certainly suggests to me that the model required a real matrix with 57,194,029,448 rows and 10 columns - to me that suggests over a terabyte of memory is required.

    Comment


    • #3
      Exactly. 4 terabytes, because each entry in a Mata matrix is double precision, which takes 8 bytes. How many observations are in the regression? How many equations? Is the default of adaptive quadrature being accepted by not using the redraws() option? If so, is the intpoints() option used to control the number of quadrature points? (You could go as low as, say, 7.)
      Last edited by David Roodman; 09 Jul 2022, 08:17.

      Comment


      • #4
        Originally posted by William Lisowski View Post
        The error message certainly suggests to me that the model required a real matrix with 57,194,029,448 rows and 10 columns - to me that suggests over a terabyte of memory is required.
        Hi Williams,

        Thanks a lot for your suggestion. If this is the case, do you know if it's still possible to run the regression? For instance, if I use State MP with 16 cores or a computer with 64GB ram?

        I have 5 simultaneous equations with around 20 variables. But one categorical variable has more than 3000 values and I have 40,000 obs.

        Thank you and look forward to your reply.

        Comment


        • #5
          Originally posted by David Roodman View Post
          Exactly. 4 terabytes, because each entry in a Mata matrix is double precision, which takes 8 bytes. How many observations are in the regression? How many equations? Is the default of adaptive quadrature being accepted by not using the redraws() option? If so, is the intpoints() option used to control the number of quadrature points? (You could go as low as, say, 7.)
          Hi David,

          Thanks a lot for your reply. I have 40,000 obs which I used to run regression on 5 simultaneous equations. I didn't use any options for now, just a combination of simultaneous equations and random coefficients on individual level (which is 10000). Could you please illustrate a little bit on the intpoints option and how would it tackle the problem?

          Thanks and look forward to your reply.

          Comment


          • #6
            Originally posted by William Lisowski View Post
            The error message certainly suggests to me that the model required a real matrix with 57,194,029,448 rows and 10 columns - to me that suggests over a terabyte of memory is required.
            Hi Williams,

            Thanks for your previous reply. I wonder if you know if it's still possible to run this in Stata or should I use other languages such as R or Python?

            Thank you and look forward to your reply.

            Comment


            • #7
              I'm afraid I don't have any expertise in CMP models - just in guessing at the meaning of Mata error messages.

              Comment


              • #8
                For how to fiddle with -cmp- read the help file.

                If you try this in R or Python, do report on the success or lack of success thereof.

                On the econometric side "one categorical variable has more than 3000 values " sounds very dodgy. Are these values ordered? Do the difference between these values have a meaning?

                To give you a benchmark, in labour economics age and experience are treated as continuous variables, but both are pretty much between 16 and 90 in typical samples, this is 74 unique values.

                If there is a way to treat your "one categorical variable has more than 3000 values " as a continuous variable, this would probably solve your problem.

                Comment


                • #9
                  Originally posted by Joro Kolev View Post
                  For how to fiddle with -cmp- read the help file.

                  If you try this in R or Python, do report on the success or lack of success thereof.

                  On the econometric side "one categorical variable has more than 3000 values " sounds very dodgy. Are these values ordered? Do the difference between these values have a meaning?

                  To give you a benchmark, in labour economics age and experience are treated as continuous variables, but both are pretty much between 16 and 90 in typical samples, this is 74 unique values.

                  If there is a way to treat your "one categorical variable has more than 3000 values " as a continuous variable, this would probably solve your problem.
                  Hi Joro,

                  Thank you so much for your kind reply. I'll definitely try the continuous variable method and see if it works.

                  Comment


                  • #10
                    Originally posted by William Lisowski View Post
                    I'm afraid I don't have any expertise in CMP models - just in guessing at the meaning of Mata error messages.
                    Hi William,

                    Got it! Thank you all the same

                    Comment

                    Working...
                    X