Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel data regression model; characteristic contents too long

    Hello Statalist community,

    I have a challenge with Stata for which I didn't find a solution in the forum or archives.

    I try to run the following panel date regression (with cross-section fixed effects): "xi: reg depvars(1) indepvars(15) i.bank, vce(cluster bank)".

    After a while it returns: "characteristic contents too long. The maximum value of the contents is 67,784."

    I use MP and already increased maxvar and matsize to what I believe sufficient sizes. The problem I think is the variable "i.bank" which might create a characteristic for each dummy with all the names of the previous dummies also included, it stops at around 8000-9000 dummies are created (and I need to go to 17.000). Because I use a ols regression model (no fe or re) removing the "xi:" didn't help either.

    Does anyone have a suggestion on how to solve this?

    Any help is appreciated!


    Kind regards,
    Tom

  • #2
    I think the issue is that a Stata command can not be longer than 67784 characters.With 17000 dummies, they'd have to be short than 3 characters each to even have a shot at this. You might have a shot at this if you create the dummies yourself using the alphabet as their names (aaa, aab, aac, ..., zzy, zzz). A better solution imo is to demean your data by bank (over time) or simply use the xtreg, fe command where the "fe" takes care of the i.bank part.

    PS: Note that normally the "xi:" prefix is obsolete unless you are running a super old Stata version.

    Comment


    • #3
      Does Stata also 'store' the data without "xi"? Because after that I want to use "esttab" to create a table.

      Regarding the characters, I have found this: "# characters in a command (MP) 1,081,527" so that should be good.

      Comment


      • #4
        You mean the regression results? If so, yes. Have you tried running it without the vce(cluster) option?

        Comment


        • #5
          I think your initial problem is coming from using the xi: prefix. It is now obsolete, its functions having been taken over and improved upon by the use of factor variable notation (-help fvvarlist-). In most situations using -xi- does no harm, but I think this is one of the situations where it actively messes things up. The -xi- command creates a _dta- characteristic that contains the names of the indicator variables it creates. If you have 80,000 such variables, that is guaranteed to exceed the maximum length of a characteristic (67,784 bytes in the larger flavors of current Stata).

          So jettison the -xi-, and use factor variable notation (which, if you don't have any interaction terms in your model will require nothing more than omitting mention of -xi:-) and that problem will go away.

          That said, do you really want to get output on 80,000+ individual banks? How will you read it? How will you have time to read it? On the assumption that the bank-effects are just nuisance variables anyway, you can handle this more efficiently with either -xtreg, fe- or -areg-.

          Comment


          • #6
            Thanks both for the answers! The command -areg- did the trick. I also removed the xi prefix which made my command: areg depvars(1) indepvars(15), absorb(bank) vce(cluster bank)".

            To answer the question why I want to include the dummies: I have 17.000 banks in the sample which I want to include in my ols panel regression as (cross-section) fixed effects, I will not interpret them. (Btw, this alone resulted in an increase of the adjusted r-squared from 0.5 to 0.7+)

            Comment

            Working...
            X