Panel data regression model; characteristic contents too long

Tom Horst

Join Date: Aug 2016

Posts: 3
#1

Panel data regression model; characteristic contents too long

10 Aug 2016, 03:39

Hello Statalist community,

I have a challenge with Stata for which I didn't find a solution in the forum or archives.

I try to run the following panel date regression (with cross-section fixed effects): "xi: reg depvars(1) indepvars(15) i.bank, vce(cluster bank)".

After a while it returns: "characteristic contents too long. The maximum value of the contents is 67,784."

I use MP and already increased maxvar and matsize to what I believe sufficient sizes. The problem I think is the variable "i.bank" which might create a characteristic for each dummy with all the names of the previous dummies also included, it stops at around 8000-9000 dummies are created (and I need to go to 17.000). Because I use a ols regression model (no fe or re) removing the "xi:" didn't help either.

Does anyone have a suggestion on how to solve this?

Any help is appreciated!

Kind regards,
Tom
Tags: data, fixed effects, panel, panel data, regression
Jesse Wursten

Join Date: Jan 2016

Posts: 915
#2

10 Aug 2016, 05:45

I think the issue is that a Stata command can not be longer than 67784 characters.With 17000 dummies, they'd have to be short than 3 characters each to even have a shot at this. You might have a shot at this if you create the dummies yourself using the alphabet as their names (aaa, aab, aac, ..., zzy, zzz). A better solution imo is to demean your data by bank (over time) or simply use the xtreg, fe command where the "fe" takes care of the i.bank part.

PS: Note that normally the "xi:" prefix is obsolete unless you are running a super old Stata version.
Comment
Tom Horst

Join Date: Aug 2016

Posts: 3
#3

10 Aug 2016, 05:57

Does Stata also 'store' the data without "xi"? Because after that I want to use "esttab" to create a table.

Regarding the characters, I have found this: "# characters in a command (MP) 1,081,527" so that should be good.
Comment
Jesse Wursten

Join Date: Jan 2016

Posts: 915
#4

10 Aug 2016, 06:38

You mean the regression results? If so, yes. Have you tried running it without the vce(cluster) option?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#5

10 Aug 2016, 08:52

I think your initial problem is coming from using the xi: prefix. It is now obsolete, its functions having been taken over and improved upon by the use of factor variable notation (-help fvvarlist-). In most situations using -xi- does no harm, but I think this is one of the situations where it actively messes things up. The -xi- command creates a _dta- characteristic that contains the names of the indicator variables it creates. If you have 80,000 such variables, that is guaranteed to exceed the maximum length of a characteristic (67,784 bytes in the larger flavors of current Stata).

So jettison the -xi-, and use factor variable notation (which, if you don't have any interaction terms in your model will require nothing more than omitting mention of -xi:-) and that problem will go away.

That said, do you really want to get output on 80,000+ individual banks? How will you read it? How will you have time to read it? On the assumption that the bank-effects are just nuisance variables anyway, you can handle this more efficiently with either -xtreg, fe- or -areg-.
Comment
Tom Horst

Join Date: Aug 2016

Posts: 3
#6

12 Aug 2016, 05:03

Thanks both for the answers! The command -areg- did the trick. I also removed the xi prefix which made my command: areg depvars(1) indepvars(15), absorb(bank) vce(cluster bank)".

To answer the question why I want to include the dummies: I have 17.000 banks in the sample which I want to include in my ols panel regression as (cross-section) fixed effects, I will not interpret them. (Btw, this alone resulted in an increase of the adjusted r-squared from 0.5 to 0.7+)
Comment

Announcement

Panel data regression model; characteristic contents too long

Comment

Comment

Comment

Comment

Comment