macro length exceeded

Alex Sanz

Join Date: Sep 2017

Posts: 3
#1

macro length exceeded

06 Sep 2017, 09:50

Hi,

I am trying to use the synth command with stata in order to create a synthetic control matching my treatment group. Unfortunately using the command:

synth NXO t hr, trunit(4) trperiod(41641) counit(1 2 3 6 7 8 10 12 14 15 16 17 19 21)

I got an error message: macro length exceeded.

I have been surfing through the forum and also the internet to find this type of error with the synth command but I could not find anything... What should I do in order to solve this problem?

About the database, I am using a panel data with more than 1.000.000 observations for 21 individuals through 9 years.

Thanks in advance.

Alex
Tags: None
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

06 Sep 2017, 11:04

Welcome to Statalist, Alex.

I hate to be the bearer of bad news to a new member.

Unfortunately, you are sunk by a combination of the way the synth command (user written, from SSC) is written and the large number of periods for each individual in your data.

I don't see a straightforward way around the problem. It would be possible to edit synth.ado and remove or replace the offending code, which exists to check that the value you supplied for trperiod is found in your data, but my fear is that the program has (clearly) not been tested on panel data covering the vast number of periods yours has, and the program may fail elsewhere, or not run in a reasonable time.
Comment
Alex Sanz

Join Date: Sep 2017

Posts: 3
#3

06 Sep 2017, 11:41

Originally posted by William Lisowski View Post

Welcome to Statalist, Alex.

I hate to be the bearer of bad news to a new member.

Unfortunately, you are sunk by a combination of the way the synth command (user written, from SSC) is written and the large number of periods for each individual in your data.

I don't see a straightforward way around the problem. It would be possible to edit synth.ado and remove or replace the offending code, which exists to check that the value you supplied for trperiod is found in your data, but my fear is that the program has (clearly) not been tested on panel data covering the vast number of periods yours has, and the program may fail elsewhere, or not run in a reasonable time.

Thanks for your fast response William. Also, there is no problem for being "the bearer of bad news", it is preferible to know that something is not possible (nowadays) instead of spending days or weeks trying to find a solution that does not exist.

About the problem, if I am correct, the problem is that having so large number of periods (for each individual) is not compatible with the way the synth command is written. In my case I have 78912 observations for each individual, do you think it is considered too large for the synth command?

Thanks in advance.
Comment
Friedrich Huebler

Join Date: Apr 2014

Posts: 1053
#4

06 Sep 2017, 12:01

You can try to change the limits in Stata. From help limits:

The maximum length of the contents of a macro are fixed in Stata/IC and settable in Stata/SE and Stata/MP. The currently set maximum length is recorded in c(macrolen); type display c(macrolen). The maximum length can be changed with set maxvar. If you set maxvar to a larger value, the maximum length increases; if you set maxvar to a smaller value, the maximum length decreases. The relationship between them is maximum_length = 129*maxvar + 200.
Comment
Alex Sanz

Join Date: Sep 2017

Posts: 3
#5

06 Sep 2017, 12:44

Originally posted by Friedrich Huebler View Post

You can try to change the limits in Stata. From help limits:

Thanks Friedrich. I could change the limits as you said in your post. Although I skipped the macro lenght exceeded another error appeared:

invalid numlist has too many elements

Hope I would be able to solve this one.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#6

06 Sep 2017, 13:02

Friedrich's note suggests a way forward that I neglected to consider.

I do remain concerned that you will run into other size limitations. Looking further at the ado file, the synth command does a lot with Stata matrices (ouside of Mata) and these matrices are limited to 11,000 rows or columns, if I understand correctly. If the command is creating matrices with 1 row per observation, you seem likely to hit that limit.
Comment
Jason Cruso

Join Date: Sep 2017

Posts: 68
#7

24 Oct 2017, 05:14

Hi,
I have been trying to work around the problem of "exceeded macro length" on using "levelsof" command on my large panel data using the set maxvar, but not successful!

I do the following:

>display c(macrolen)
645200

However, when I try to use the below command, i get the following error:
>set maxvar 999999
no;data in memory would be lost

(in fact, setting maxvar to any value throws the same error)

Not sure what the problem is.

Any help appreciated!

Thanks.

Last edited by Jason Cruso; 24 Oct 2017, 05:21.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35709
#8

24 Oct 2017, 05:38

Maximum macro length has nothing to do with the number of variables allowed. A glance at help limits will show that no current version of Stata allows more than 120000 variables in any case.

You don't say why you want to use levelsof at all. I have no bias against levelsof, but whenever this bites levelsof is just the wrong thing to use and the resulting macro would slow you down even if it could be produced.

Presumably you want some kind of loop over panels which have irregular identifiers. One way to do that is just a forvalues loop over the distinct identifiers mapped to integers 1 up.

Code:

* label option may be problematic for really big datasets egen newid = group(id), label summarize newid

and then to loop over the values of this variable: the summarize results show you the minimum and maximum.

Then again, not every panel operation requires a loop over panels, but we can't see what you intend.
Comment
Friedrich Huebler

Join Date: Apr 2014

Posts: 1053
#9

24 Oct 2017, 07:13

Originally posted by Nick Cox View Post

Maximum macro length has nothing to do with the number of variables allowed.

According to help limits the maximum number of characters in a macro is related to the maximum number of variables.

The maximum length of the contents of a macro are fixed in Stata/IC and settable in Stata/SE and Stata/MP. The currently set maximum length is recorded in c(macrolen); type display c(macrolen). The maximum length can be changed with set maxvar. If you set maxvar to a larger value, the maximum length increases; if you set maxvar to a smaller value, the maximum length decreases. The relationship between them is maximum_length = 129*maxvar + 200.

Code:

. set maxvar 2048 . di c(macrolen) 264392 . set maxvar 32767 . di c(macrolen) 4227143
Comment
Jason Cruso

Join Date: Sep 2017

Posts: 68
#10

24 Oct 2017, 07:19

Thanks Nick!

Though I am yet to try that code out...not sure if that would solve my problem.

This is the structure of the data.

Year OrgId Person_id Person_Income
1 1 1 100
1 1 2 200
1 1 3 300
1 1 4 400
1 2 5 25
1 2 6 30
1 2 7 35
1 2 8 40
2 3 9 10
2 3 10 15
2 3 11 20
2 3 12 25

This is a very large panel dataset of about 18 years.

I need to find the inequality parameters (gini, ge(0), ge(-1)) WITHIN each OrgId. I use the ineqdeco command.

This is the code:

Code:

local years 1999 2002......2015 foreach y of local years { levelsof OrgId if year == 'y', local(firms) foreach f of local firms { ineqdeco Person_Income if OrgId == 'f' & Year == 'y' ... ... .... } }
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#11

24 Oct 2017, 07:30

Following up on Nick's suggestion, the help file for ineqdeco suggests you can dispense with the looping using

Code:

ineqdeco Person_Income, bygroup(newid)
Comment
Jason Cruso

Join Date: Sep 2017

Posts: 68
#12

24 Oct 2017, 07:35

thanks.
I remember trying this first.

But if I remember right...it does not work for large datasets!

throws an error something like too many values or something like that!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35709
#13

24 Oct 2017, 07:51

Friedrich's correction is good in #9. My error on that, although trying to set maxvar above the limits won't I think work.

The problem with #12 is that ineqdeco (from SSC, as you are asked to explain) relies internally on levelsof.

The code in #10 is buggy (the single quotes ' ' are illegal) but can be improved a bit:

Code:

egen newOrgId = group(OrgId), label su newOrgId, meanonly local nId = r(max) forval y = 1999/2015 { forval j = 1/`nId' { ineqdeco Person_Income if newOrgId == `j' & Year == `y' } }

That said, rangerun (SSC) would be a better framework, but I've not got time right now to look at that.

Last edited by Nick Cox; 24 Oct 2017, 07:55.
Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

#14

24 Oct 2017, 13:44

I know nothing of ineqdeco but there's a new program on SSC called runby that can be used to run commands on by-group subsamples. Here's a quick example:

Code:

clear all

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(year OrgId person_id) int Person_Income
1 1  1 100
1 1  2 200
1 1  3 300
1 1  4 400
1 2  5  25
1 2  6  30
1 2  7  35
1 2  8  40
2 3  9  10
2 3 10  15
2 3 11  20
2 3 12  25
end

program do_it
    ineqdeco Person_Income
    gen gem1 = r(gem1)
end

runby do_it, by(OrgId year)

and the results:

Code:

. list, sepby(OrgId year)

     +-----------------------------------------------+
     | year   OrgId   person~d   Person~e       gem1 |
     |-----------------------------------------------|
  1. |    1       1          1        100   .1510417 |
  2. |    1       1          2        200   .1510417 |
  3. |    1       1          3        300   .1510417 |
  4. |    1       1          4        400   .1510417 |
     |-----------------------------------------------|
  5. |    1       2          5         25   .0155506 |
  6. |    1       2          6         30   .0155506 |
  7. |    1       2          7         35   .0155506 |
  8. |    1       2          8         40   .0155506 |
     |-----------------------------------------------|
  9. |    2       3          9         10   .0614583 |
 10. |    2       3         10         15   .0614583 |
 11. |    2       3         11         20   .0614583 |
 12. |    2       3         12         25   .0614583 |
     +-----------------------------------------------+

.

Comment

Jason Cruso

Join Date: Sep 2017

Posts: 68
#15

25 Oct 2017, 04:18

Thanks everyone for all your comments!

Though I have been able to overcome the levelsby problem (increasing the maxvar), i am stuck with the ineqdeco command! It just seems to take forever to get the inequality parameters for year-orgId. I tried this on a 10% sample, and the code just does not seem to finish computing the inequality measures.

Does anyone else face this problem?
Do I have to "modify" the ineqdeco.ado and probably rewrite my own ineqdeco?
Comment

Announcement

macro length exceeded

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment