Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fixed effect model with collinearity issues

    Dear stata users,

    I want to run a regression with industry and year fixed effects. When I asked my supervisor, she told me to use the following command:
    Code:
     xi: reg y x i.fyear i.sic2, cl(cusip8)
    After the regression, I want to run a regression with fixed effects and robust standard errors, because I want to control for heteroskedasticity.
    In order to do so, I found the following regression online:
    Code:
     xtset cusip8
    xtreg y x i.fyear i.sic2, fe vce(robust)
    However, this code gives me 2 problems.
    1. when I try to run the xtreg code without the robust option, it gives me that "xx.sic2 omitted because of collinearity", but the results are different compared to the xi regression.
    2. when I run it with the robust option, it returns me that "panels are not nested within clusters"

    I was wondering why the sic2 dummies are getting removed and how to solve this. Same applies for the 2nd problem, I dont understand the error nor how to solve this.

    I hope someone can explain and try to solve this problem so I can run a regression with fixed effects and with and without robust function.


  • #2
    Well, the removal of the *.sic2 terms from the model for colinearity implies that in any given cusip8, the value of sic2 is the same for all observations. I have only a passing acquaintance with terminology like sic and cusip, so I'm not positioned to say whether this makes sense or not. If it does not make sense, then there is something wrong with your data and you need to investigate why and how to fix it. If it does make sense that the value of sic2 would be an unchanging constant within a cusip, then the model you are running is simply incapable of estimating the sic2 effects: you can never, in a fixed effects regression, estimate effects that are constant within panels. That's not some peculiarity of Stata. It's linear algebra and there is no way around it. If estimation of the sic2 effects is an important part of your research goals, then you will have to switch to a different model, perhaps between-effects, or perhaps random-effects to do that.

    Another approach would be to -xtset- the data with sic2 as the panel variable instead of cusip8 and specify -vce(cusip8)- in the -xtreg- command. This would give the same results as your -regress- command because it implements the same model.

    I suspect that what you have here is actually 3-level data, with observations nested within sic2, and sic2's nested within cusip8's. Everything you've shown is consistent with that assumption. But you are trying to squash this 3-level data into a two-level model, so something is going to have to "give." Another approach that would be used in my discipline is to do a 3-level analysis of the 3-level data using a mixed effects model. I realize that mixed-effects models are viewed skeptically in finance and econometrics. The truth is that there is no ideal way to model 3-level data. The choice is between having using a consistent estimator of a mis-specified model (some model estimated with -xtreg, fe-) or an inconsistent estimator of a correctly specified model (-mixed).

    As for the results of -xtreg, fe- being different from the -regress- you ran earlier, that is no surprise. It's a different model. The -xtreg, fe- model incorporates cusip8 as a model variable, implicitly through its use as the panel variable in your -xtset- command. The -regress- model does not include cusip8, except as a clustering level for the VCE estimator. So the models are quite different and there is no reason to expect they will give the same results, nor even that the results will be at all similar. They could, in principle, differ in any imaginable way.

    As for the problem with -vce(robust)-, I have to say I cannot understand how this can be happening, and I would like you to post back showing the exact command and the exact output from Stata, copied and pasted with no editing whatsoever from Stata's Results window (or your log file) into the Forum editor (please put it between code delimiters so it aligns nicely. The reason I don't think what you are saying is possible is this:

    Assuming you are running version 13 or later, -vce(robust)-, when used with -xtreg, fe- (and a few other, but not all, estimators) is interpreted by Stata as -vce(cluster panelvar)-, where panelvar is the panel variable declared in the -xtset- command. Therefore, this usage guarantees that the panel variable and the cluster variable are identical, so this panels are not nested in clusters situation cannot possibly arise. So I would really like to see the exact command(s) and output, as I suspect they are not quite as you believe them to be.



    Comment


    • #3
      I'm not sure if this is true or not, but I assume that the collinearity error comes from the unique combination of cusip, sic and fyear.

      However, I think I made a mistake by trying to use an OLS regression since my dependent variable is coded as a binary one. I think the more suited option would be using the probit regression.

      Comment


      • #4
        I'm not sure if this is true or not, but I assume that the collinearity error comes from the unique combination of cusip, sic and fyear.
        I don't know what you mean by this. But if the combination of cusip, sic and fyear uniquely identifies observations in your data, then yes, that would also be a source of colinearity among sic, fyear, and the implicit cusip indicators in your -xtreg, fe- model.

        Using OLS with a dichotomous outcome variable is not necessariliy wrong. It provides a linear probability model, which in some ways is easier to understand than the non-linear models more commonly used (logit and probit). But they become fairly unintuitive if the predicted probabilities go below 0 or above 1. So it is more common to use logit or probit, both of which avoid these problems.

        If you are moving to probit, let me point out that -probit y x i.fyear i.sic2- is not equivalent to a fixed effects probit model. The equivalence of adding indicators ("dummies") and a fixed-effects model applies only to linear models, not to probit (nor logit). Now, if you look at -xtprobit-, you will note that there is no -fe- version of the model. Nobody has figured out how to estimate a conditional fixed effects probit model. So if you are going with probit, you are leaving the realm of fixed-effects models anyway. That being the case, you should consider going all the way to -meprobit- and reflecting the 3 dimensional architecture of your data (assuming that I was right that observations are nested in sic2's which are nested in cusip8's.)

        Comment


        • #5
          Thanks for your advice regarding the probit/logit models, I will take that into account if I decide to switch over. As for now, I will stick to what I have and discuss it with my supervisor later on.

          But if the combination of cusip, sic and fyear uniquely identifies observations in your data, then yes, that would also be a source of colinearity among sic, fyear, and the implicit cusip indicators in your -xtreg, fe- model.
          This assumption is indeed correct. Would it be correct to drop the industry fixed effects if this is the case? Or would you suggest going with the 3level data?

          To come back to my first post, it was not the command with -robust that caused errors as you stated. It was the combination of -xtset- sic2 and xtreg y x vce(cusip) that caused the "panels are not nested within clusters" error. Would dropping the sic2 solve this problem?

          Comment


          • #6
            Would it be correct to drop the industry fixed effects if this is the case? Or would you suggest going with the 3level data?
            Well, you can't have all three, cusip, fyear, and sic in the model as fixed effects. Something has to go. I would move to a 3-level mixed-effects model, but in my discipline nobody looks askance at random effects. So this may or may not be feasible for you. If you are stuck with fixed-effects modeling then you will have to eliminate something. Whether it makes more sense to eliminate sic2 or eliminate cusip8 depends on your research goals, so I can't answer that question for you.

            To come back to my first post, it was not the command with -robust that caused errors as you stated. It was the combination of -xtset- sic2 and xtreg y x vce(cusip) that caused the "panels are not nested within clusters" error.
            You have to consider the nesting structure of your data. You have sic2 codes and cusip8 codes. I don't work in finance: I've heard of these before, and in my own work I have used sic codes to represent industries in studies of occupational health. I'm not sure I know what cusip8 codes but I believe they represent individual firms, or perhaps individual securities. In any case, an industry will contain many firms or securities, but any given firm or security will typically belong to only one industry. So you have, contrary to what I said in #2, cusip8 nested within sic2 (not the other way around). This explains why you are getting that error. It is a requirement of the cluster-robust variance estimator applied to panel data that panels be nested in clusters. Otherwise put, each panel must belong to one and only one cluster. But when you set -xtset sic2- and vce(-cluster cusip8)- you are getting that exactly backwards. That won't work.

            If you stick with -xtset sic2- then you can only have clusters that are at the level of sic2, or higher (i.e. some other variable such that any sic2's observations all have the same value for that clustering variable.) So far at least, you haven't described any variables higher than sic2 in your data, so you are limited to -vce(cluster sic2)- there. Alternatively, maybe it makes more sense for you to -xtset cusip8-, and then you can use -vce(cluster sic2)- because cusip8's are nested within sic2's.

            Comment

            Working...
            X