Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fixed Effects Dummy Variables _Omitted and Best Model

    Hi Stata Intellectuals,

    I have one more quick question on fixed effect. I have a panel data with firm level characteristics over 10 years. I am currently running a code, but all fixed effect control dummies for year and industry are collinear (all of them are omitted). What is the reason for that? I found some previous discussions in this forum for one (witch is normal) or two omitted dummy variables, but not for all.

    Here is the code:
    (1)
    xtset firm year
    xtreg y x1 x2 x3 i.year i.industry, fe


    in this case all dummies for year and industry are all omitted

    When I run (as suggested in a previous discussion) :
    (2)
    xtreg y x1 x2 x3 i.year i.industry

    the variables are not omitted, but if I don't specify fixed effect or random effect what is Stata running when I use xtreg?

    An alternatively suggested model is the following:
    (3)
    egen both= group(year indu)
    xtreg y x1 x2 x3 i.both, fe


    or finally
    (4)
    xtreg y x1 x2 x3 , i(both) fe


    In summery: when I use #(1) all dummies are omitted, why? What is xtreg running without specifying fe or re? What is the difference between 3 and 4? What is the recommended approach?


    Thank you

    Marco

  • #2
    Marco:
    as the dinner chime is ringing, two replies out of your 4 questions:
    (1): as you diagnosed, the problem is the collinearity with fixed effect; I would also suspect from your question #(2) that -fe- cancelled out the time-invariant predictors (something that does not happen with the -re- specification);
    (2) by default, -xtreg- goes -re-
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      1. That the industry indicators ("dummies") get omitted is unsurprising: each firm always belongs to the same industry, so industry doesn't vary within your panels, and is correlated with the fixed effects.

      Why your year variables get dropped is unclear. Suffice it to say that when Stata drops variables for colinearity, it always has a good reason. To give you a better explanation I think you would have to show a sample of your data. Perhaps there is colinearity between year and x1, x2, or x3 (or some combination of those). Without seeing the data, it's impossible to say.

      2. When you do not specify, the default for -xtreg- is random effects.

      The model in 3 keeps firm as the panel variable and year as the time variable. It then attempts to introduce indicator variables for combinations of year and industry. You will find that a number of these variables will be dropped due to colinearity. The reason is that, again, industry does not vary within firm. So for any given firm, all of the indicators that include a different industry are zero, and of the indicators that correspond to the firm's industry, all of them are zero except for one--so they always sum to 1. Stata will have no choice but to get rid of some of them. Ordinarily, I think this would be more or less equivalent to using i.year, but since your model is dropping year as well, I think you are going to lose them all.

      The model in 4 overrides the designation of firm as the panel variable and instead replaces the pairings of industry and year as the panel effects. You probably won't encounter difficulties with these variables being dropped (although, since i.year mysteriously drops in your data, all bets are off here). But this model is badly specified because it now ignores the clustering of observations within firms. It treats all observations for a given industry and year as independent, when in reality they are not. It also fails to correct for missing variable bias at the firm level. So even though it will probably run, the results you get will likely be useless.

      I'm not sure what you're trying to do here with industry. Given that each firm always belongs to the same industry, any observed or unobserved industry-level effects are already adjusted for in the basic model -xtreg y x1 x2 x3, fe-. If your goal is to actually estimate industry level effects, then you simply can't do that in a fixed-effects model. It is, in principle, impossible. You either have to go to random effects, with the additional assumptions that entails, or abandon the attempt. Those effects are simply not estimable in a fixed effects model.

      Added: Crossed in cyberspace with Carlo's response, which makes some of the same points.

      Comment


      • #4
        Thank you so much. Your comments are extremely helpful.

        Going back to model 1, even when I drop all the Xs, the year dummies are still omitted ( xtreg y i.year, fe). I assume that the year dummies are collinear with the constant (the only variable left). Someone suggested ( in a previous post) that the year dummies are dropped out because I am setting year as my panel time varying variable (xtset firm year) and the panel is strongly balanced. What are you thoughts?

        I am not looking to estimate industry level effects, but I would like to control for "shocks" in industry and years. My dependent variable is earnings of firms for every year since 2000. I want to make sure that my results are not biased by industry shocks (e.g. like oil price collapse, a fire in a region, ...don't really care what those are, just want to find a way to keep this into consideration). That is way I was including dummies for years and industries. What would you recommend?

        Thank you so much

        Marco

        Carlo, Buon appetito!

        Comment


        • #5
          Someone suggested ( in a previous post) that the year dummies are dropped out because I am setting year as my panel time varying variable (xtset firm year) and the panel is strongly balanced. What are you thoughts?
          I doubt that. The time variable in -xtset- is ignored by -xtreg, fe- altogether. If every single year were represented by an indicator, they would, of course, sum to 1 and be colinear with the constant term. But Stata knows about that and automatically drops 1. So somehow the year variable must be colinear with the fixed effects. This would happen if, for example, there is only one year of data for each firm. That seems implausible in a typical data set, but it might arise in the estimation sample if the pattern of missing values of your outcome variable led to the elimination of all but one year in the estimation sample. Without seeing a sample of your data, I don't think I can say any more about this. (Please use -dataex- to post a small representative sample; -ssc install dataex-, -help dataex-).

          I would like to control for "shocks" in industry and years.
          Industry-level shocks are already adjusted (not "controlled"--you can only "control" things in an experimental design) for by the firm fixed effects because industry is constant within firm. If we can manage to get your year variable kept in the model, then industry-year shocks will be accounted for specifically. Either way, there is no need to include industry explicitly in the model, and it isn't even possible anyhow.

          Comment


          • #6
            Marco:
            Thanks for wishing me good dinner (Buon Appetito); I do reciprocate the same to you.
            I do share Clyde's remarks: if you are interested in estimating time-invariant predictors, the only feasible choice is going -re-.
            I do not know whether -hausman- verdict is in favour of -fe- or -re-; anyway, both specifications have their own pros and cons. As far as cons are concerned, -fe- can't estimate time-invaring predictors (as said above), whereas -re- assumes no correlation between individual effects and the vector of predictors (often untenable). As always, a sort of risk/benefit analysis between -fe- and -re- is almost mandatory.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Thank you so much for your great comments. When I was trying to generate a sample of the data using dataex, I re-run the model and I realized that when I run the model in the following order (in the do file):

              xtset id year
              xtreg y_1 x_3, i(both)fe
              xtreg y_1 x_3 i.year,fe


              The year dummies in the second regression (xtreg y_1 x_3 i.year,fe) were omitted. However when I "muted" the second line and I run the following:

              xtset id year
              xtreg y_1 x_3 i.year,fe


              The year dummies were NOT omitted. This seems so strange to me. Is Stata saving the i(both) from the previous line? I don't think so, but something must be happening. Does the order of commands matter?

              Thank you again for your help and time


              (One last thing, I installed dataex and generated the sample, how do I copy and past it here? I looked at the help and it explains what the command does but not how to post the sample in Statalist. Thank you)




              Comment


              • #8
                Is Stata saving the i(both) from the previous line?
                Yes, when you use the -i()- option in any -xt- command, it overwrites the designation of the panel variable from previous -xtset- or -i()- invocations. The variable specified becomes the new panel variable thereafter until explicitly changed with another -xtset- or -i()- invocation.

                Comment


                • #9
                  (One last thing, I installed dataex and generated the sample, how do I copy and past it here? I looked at the help and it explains what the command does but not how to post the sample in Statalist. Thank you)
                  You will note that the output of -dataex- begins with the word code enclosed in brackets, and ends with the same, except that it is preceded by a slash. Just highlight all of that in the Results window (including the word code and the brackets and the slash), copy it (however that works in your operating system) and then paste it all into the Forum editor here.

                  Comment


                  • #10
                    That is extremely interesting. So the fact that I was running the i() after xtset, overwrote the xtset command and that is way my year dummies were omitted! Now it makes sense .
                    Thank you very much for your help!!!
                    Marco

                    Comment

                    Working...
                    X