Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • macro length exceeded

    Hi,

    I am trying to use the synth command with stata in order to create a synthetic control matching my treatment group. Unfortunately using the command:

    synth NXO t hr, trunit(4) trperiod(41641) counit(1 2 3 6 7 8 10 12 14 15 16 17 19 21)

    I got an error message: macro length exceeded.

    I have been surfing through the forum and also the internet to find this type of error with the synth command but I could not find anything... What should I do in order to solve this problem?

    About the database, I am using a panel data with more than 1.000.000 observations for 21 individuals through 9 years.

    Thanks in advance.

    Alex

  • #2
    Welcome to Statalist, Alex.

    I hate to be the bearer of bad news to a new member.

    Unfortunately, you are sunk by a combination of the way the synth command (user written, from SSC) is written and the large number of periods for each individual in your data.

    I don't see a straightforward way around the problem. It would be possible to edit synth.ado and remove or replace the offending code, which exists to check that the value you supplied for trperiod is found in your data, but my fear is that the program has (clearly) not been tested on panel data covering the vast number of periods yours has, and the program may fail elsewhere, or not run in a reasonable time.

    Comment


    • #3
      Originally posted by William Lisowski View Post
      Welcome to Statalist, Alex.

      I hate to be the bearer of bad news to a new member.

      Unfortunately, you are sunk by a combination of the way the synth command (user written, from SSC) is written and the large number of periods for each individual in your data.

      I don't see a straightforward way around the problem. It would be possible to edit synth.ado and remove or replace the offending code, which exists to check that the value you supplied for trperiod is found in your data, but my fear is that the program has (clearly) not been tested on panel data covering the vast number of periods yours has, and the program may fail elsewhere, or not run in a reasonable time.
      Thanks for your fast response William. Also, there is no problem for being "the bearer of bad news", it is preferible to know that something is not possible (nowadays) instead of spending days or weeks trying to find a solution that does not exist.

      About the problem, if I am correct, the problem is that having so large number of periods (for each individual) is not compatible with the way the synth command is written. In my case I have 78912 observations for each individual, do you think it is considered too large for the synth command?

      Thanks in advance.

      Comment


      • #4
        You can try to change the limits in Stata. From help limits:

        The maximum length of the contents of a macro are fixed in Stata/IC and settable in Stata/SE and Stata/MP. The currently set maximum length is recorded in c(macrolen); type display c(macrolen). The maximum length can be changed with set maxvar. If you set maxvar to a larger value, the maximum length increases; if you set maxvar to a smaller value, the maximum length decreases. The relationship between them is maximum_length = 129*maxvar + 200.

        Comment


        • #5
          Originally posted by Friedrich Huebler View Post
          You can try to change the limits in Stata. From help limits:
          Thanks Friedrich. I could change the limits as you said in your post. Although I skipped the macro lenght exceeded another error appeared:

          invalid numlist has too many elements

          Hope I would be able to solve this one.

          Comment


          • #6
            Friedrich's note suggests a way forward that I neglected to consider.

            I do remain concerned that you will run into other size limitations. Looking further at the ado file, the synth command does a lot with Stata matrices (ouside of Mata) and these matrices are limited to 11,000 rows or columns, if I understand correctly. If the command is creating matrices with 1 row per observation, you seem likely to hit that limit.



            Comment


            • #7
              Hi,
              I have been trying to work around the problem of "exceeded macro length" on using "levelsof" command on my large panel data using the set maxvar, but not successful!

              I do the following:

              >display c(macrolen)
              645200

              However, when I try to use the below command, i get the following error:
              >set maxvar 999999
              no;data in memory would be lost

              (in fact, setting maxvar to any value throws the same error)

              Not sure what the problem is.

              Any help appreciated!

              Thanks.

              Last edited by Jason Cruso; 24 Oct 2017, 05:21.

              Comment


              • #8
                Maximum macro length has nothing to do with the number of variables allowed. A glance at help limits will show that no current version of Stata allows more than 120000 variables in any case.

                You don't say why you want to use levelsof at all. I have no bias against levelsof, but whenever this bites levelsof is just the wrong thing to use and the resulting macro would slow you down even if it could be produced.

                Presumably you want some kind of loop over panels which have irregular identifiers. One way to do that is just a forvalues loop over the distinct identifiers mapped to integers 1 up.

                Code:
                * label option may be problematic for really big datasets 
                egen newid = group(id), label  
                summarize newid
                and then to loop over the values of this variable: the summarize results show you the minimum and maximum.

                Then again, not every panel operation requires a loop over panels, but we can't see what you intend.

                Comment


                • #9
                  Originally posted by Nick Cox View Post
                  Maximum macro length has nothing to do with the number of variables allowed.
                  According to help limits the maximum number of characters in a macro is related to the maximum number of variables.

                  The maximum length of the contents of a macro are fixed in Stata/IC and settable in Stata/SE and Stata/MP. The currently set maximum length is recorded in c(macrolen); type display c(macrolen). The maximum length can be changed with set maxvar. If you set maxvar to a larger value, the maximum length increases; if you set maxvar to a smaller value, the maximum length decreases. The relationship between them is maximum_length = 129*maxvar + 200.
                  Code:
                  . set maxvar 2048
                  
                  . di c(macrolen)
                  264392
                  
                  . set maxvar 32767
                  
                  . di c(macrolen)
                  4227143

                  Comment


                  • #10
                    Thanks Nick!

                    Though I am yet to try that code out...not sure if that would solve my problem.

                    This is the structure of the data.

                    Year OrgId Person_id Person_Income
                    1 1 1 100
                    1 1 2 200
                    1 1 3 300
                    1 1 4 400
                    1 2 5 25
                    1 2 6 30
                    1 2 7 35
                    1 2 8 40
                    2 3 9 10
                    2 3 10 15
                    2 3 11 20
                    2 3 12 25



                    This is a very large panel dataset of about 18 years.


                    I need to find the inequality parameters (gini, ge(0), ge(-1)) WITHIN each OrgId. I use the ineqdeco command.

                    This is the code:
                    Code:
                    local    years 1999 2002......2015
                    foreach y of local years {
                    levelsof OrgId if year == 'y', local(firms)
                    foreach f of local firms {
                    ineqdeco Person_Income if OrgId == 'f' & Year == 'y'
                    ...
                    ...
                    ....
                    }
                    }

                    Comment


                    • #11
                      Following up on Nick's suggestion, the help file for ineqdeco suggests you can dispense with the looping using
                      Code:
                      ineqdeco Person_Income, bygroup(newid)

                      Comment


                      • #12
                        thanks.
                        I remember trying this first.

                        But if I remember right...it does not work for large datasets!

                        throws an error something like too many values or something like that!

                        Comment


                        • #13
                          Friedrich's correction is good in #9. My error on that, although trying to set maxvar above the limits won't I think work.

                          The problem with #12 is that ineqdeco (from SSC, as you are asked to explain) relies internally on levelsof.

                          The code in #10 is buggy (the single quotes ' ' are illegal) but can be improved a bit:

                          Code:
                          egen newOrgId = group(OrgId), label
                          su newOrgId, meanonly
                          local nId = r(max)
                          forval y = 1999/2015 {
                              forval j = 1/`nId' {
                                  ineqdeco Person_Income if newOrgId == `j' & Year == `y'
                              }
                          }
                          That said, rangerun (SSC) would be a better framework, but I've not got time right now to look at that.
                          Last edited by Nick Cox; 24 Oct 2017, 07:55.

                          Comment


                          • #14
                            I know nothing of ineqdeco but there's a new program on SSC called runby that can be used to run commands on by-group subsamples. Here's a quick example:

                            Code:
                            clear all
                            
                            * Example generated by -dataex-. To install: ssc install dataex
                            clear
                            input byte(year OrgId person_id) int Person_Income
                            1 1  1 100
                            1 1  2 200
                            1 1  3 300
                            1 1  4 400
                            1 2  5  25
                            1 2  6  30
                            1 2  7  35
                            1 2  8  40
                            2 3  9  10
                            2 3 10  15
                            2 3 11  20
                            2 3 12  25
                            end
                            
                            program do_it
                                ineqdeco Person_Income
                                gen gem1 = r(gem1)
                            end
                            
                            runby do_it, by(OrgId year)
                            and the results:
                            Code:
                            . list, sepby(OrgId year)
                            
                                 +-----------------------------------------------+
                                 | year   OrgId   person~d   Person~e       gem1 |
                                 |-----------------------------------------------|
                              1. |    1       1          1        100   .1510417 |
                              2. |    1       1          2        200   .1510417 |
                              3. |    1       1          3        300   .1510417 |
                              4. |    1       1          4        400   .1510417 |
                                 |-----------------------------------------------|
                              5. |    1       2          5         25   .0155506 |
                              6. |    1       2          6         30   .0155506 |
                              7. |    1       2          7         35   .0155506 |
                              8. |    1       2          8         40   .0155506 |
                                 |-----------------------------------------------|
                              9. |    2       3          9         10   .0614583 |
                             10. |    2       3         10         15   .0614583 |
                             11. |    2       3         11         20   .0614583 |
                             12. |    2       3         12         25   .0614583 |
                                 +-----------------------------------------------+
                            
                            .

                            Comment


                            • #15
                              Thanks everyone for all your comments!

                              Though I have been able to overcome the levelsby problem (increasing the maxvar), i am stuck with the ineqdeco command! It just seems to take forever to get the inequality parameters for year-orgId. I tried this on a 10% sample, and the code just does not seem to finish computing the inequality measures.

                              Does anyone else face this problem?
                              Do I have to "modify" the ineqdeco.ado and probably rewrite my own ineqdeco?




                              Comment

                              Working...
                              X