Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Omitted variable in fixed effect model

    Dear Statalist-Members,

    - I have panel data which is identified by panel variable „id“ and time variable „year“.
    - I have 5 observations per group and I refer to the first observation as „year 0“, the second as "year 1" and so forth until the last observation which is "year 4"
    - I have a treatment variable called "indep" which is a dummy variable that is either 1 or 0 in „year 0“.
    - in "year 0", the variable "Indic" is equal to 1, for all other observations of the same "id" it is equal to 0
    - i have a dependent variable called "dep" which is measured in "year 2", "year 3" and "year 4"
    - I would like to measure the effect of the treatment in year2, year3 and year4.
    - Therefore, I create 3 variables by using the following code:


    Code:
    sort id year
    xtset id year, y
    
    gen y0 = 0
    replace y0 = 1 if indic==1
    gen y1 = 0
    replace y1 = 1 if L.indic==1
    gen y2 = 0
    replace y2 = 1 if L2.indic==1
    gen y3 = 0
    replace y3 = 1 if L3.indic==1
    gen y4 = 0
    replace y4 = 1 if L4.indic==1
    
    gen indep_y2=0
    replace indep_y2=1 if L2.indep==1 & y2==1
    gen indep_y3=0
    replace indep_y3=1 if L3.indep==1 & y3==1
    gen indep_y4=0
    replace indep_y4=1 if L4.indep==1 & y4==1
    
    xtreg dep indep_y2 indep_y3 indep_y4, re
    * here it seems the regression works correctly
    
    xtreg dep indep_y2 indep_y3 indep_y4, fe
    * now, Stata ommits indep_y4, but I do not understand why?

    A sample of my data:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double year float indic byte indep float(id dep)
    2001 1 1 1            .
    2002 0 . 1            .
    2003 0 . 1  -.027934074
    2004 0 . 1 -.0038553104
    2005 0 . 1   -.02093103
    1990 1 1 2            .
    1991 0 . 2            .
    1992 0 . 2    .00416765
    1993 0 . 2    .03713628
    1994 0 . 2    .08618537
    2001 1 1 3            .
    2002 0 . 3            .
    2003 0 . 3    .05064285
    2004 0 . 3    .06113254
    2005 0 . 3   .023163736
    1996 1 0 4            .
    1997 0 . 4            .
    1998 0 . 4  -.014856376
    1999 0 . 4   -.03399779
    2000 0 . 4    .19193664
    end
    format %ty year


    What I would like to do:
    - run the regression with fixed effects and random effects
    - conduct hausman test to decide on which effects to use

    My problem/question:
    - if I use fixed effect models in the regression, Stata omits indep_y4 but I do not understand why?
    - Am I missing something? Or specifying the variables wrong?
    - is it ok if I use random effects than?
    - or should I eliminate one of the independent variables in both models (random and fixed), run the regressions and then the hausman test?


    Any help is highly appreciated!
    Thank you,

    Samuel

  • #2
    The problem actually originates with the missing values of the variable dep. It is always missing in the years where indep_y2, indep_y3, and indep_y4 are all zero, except for id = 4 (where those three variables are 0 in every observation). The way you have defined indep_y2 through indep_y4, they are like the dummy variables for a four-level category. The only possibilities are that exactly one of them is 1 and the rest 0, or all of them are zero. Now look at how that plays out within each id separately.

    Remember that when the value of any variable in a regression is missing, as dep is in many of your observations, that observation is dropped from the estimation sample. Notice that the number of observations in your regression is not 20, it is only 12. Each id group loses 2 observations due to missingness of dep. If you then look at indep_y2 + indep_y3 + indep_y4 in the values that are retained in the estimation sample, you will notice that for id's 1, 2, and 3, that sum is always 1 in all the surviving observations. In id 4 it is 0 in all the surviving observations. So these three variables are colinear with the fixed effects in your -xtreg, fe- model. It runs fine in the -xtreg, re- model because there are no fixed effects defining "domains of colinearity."

    I cannot advise you as to how to resolve this problem. Replacing the missing values of dep with actual values, if they can be found, would be one solution. Other than that, I don't know how to advise you on recoding those three variables (or perhaps just accepting that you can only use two of them in a fixed effects model) because your code is convoluted (and made all the more so by the use of 1/. coding for variables that would be much easier to work with if coded as 1/0) and I don't really grasp what these variables are supposed to represent. My guess, however, is that there is some considerably simpler way to do whatever it is you are trying to do here.

    Comment


    • #3
      Thank you very much Clyde, you sent me on the right tracks!

      I indeed have the values for the dependent variable in t1. When I included it in the data instead of "missing", the FE model does not omit my independent variable anymore.

      I did not include it because it is not of interest to me (in the actual research design, there is a second (and distinctive) treatment in year 2 that also influences the dependent variable which is why I just start to measure the dependent variable in year2..

      Thank you very much!

      Comment

      Working...
      X