
No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • My fixed effects regression changes if I omit an omitted variable

    Hello all,

    I run a fixed effects regression in a linear probability model of self rated health and local employment change over three waves, as follows:

    . xtreg binary_health_y psum_unemployed_total_cont_y calt3_other_children_y0 i.current_county_y1 i.year i.o
    > wn_education_y age_y if has_y0_questionnaire==1 & has_y5_questionnaire==1 | has_y0_questionnaire==1 & has
    > _y10_questionnaire==1 | has_y0_questionnaire==1 & has_y5_questionnaire==1 & has_y10_questionnaire==1 | ha
    > s_y0_questionnaire==1 & cbmi_y5 !=. & has_y5_questionnaire==0 | has_y0_questionnaire==1 & cbmi_y10 !=. & 
    > has_y10_questionnaire==0 | has_y0_questionnaire==1 & cbmi_y5 !=. & has_y5_questionnaire==0 & cbmi_y10 !=.
    >  & has_y10_questionnaire==0 | has_y0_questionnaire==1 & cbmi_y5 !=. & has_y5_questionnaire==1 | has_y0_qu
    > estionnaire==1 & cbmi_y10 !=. & has_y10_questionnaire==1 | has_y0_questionnaire==1 & cbmi_y5 !=. & has_y5
    > _questionnaire==1 & cbmi_y10 !=. & has_y10_questionnaire==1, cluster (current_county_y1) fe robust 
    note: calt3_other_children_y0 omitted because of collinearity
    note: 3.current_county_y1 omitted because of collinearity
    note: 4.current_county_y1 omitted because of collinearity
    note: 5.current_county_y1 omitted because of collinearity
    note: 6.current_county_y1 omitted because of collinearity
    note: 7.current_county_y1 omitted because of collinearity
    note: 8.current_county_y1 omitted because of collinearity
    note: 9.current_county_y1 omitted because of collinearity
    note: 10.current_county_y1 omitted because of collinearity
    note: 11.current_county_y1 omitted because of collinearity
    note: 12.current_county_y1 omitted because of collinearity
    note: 13.current_county_y1 omitted because of collinearity
    note: 14.current_county_y1 omitted because of collinearity
    note: 15.current_county_y1 omitted because of collinearity
    note: 16.current_county_y1 omitted because of collinearity
    note: 17.current_county_y1 omitted because of collinearity
    note: 18.current_county_y1 omitted because of collinearity
    note: 19.current_county_y1 omitted because of collinearity
    note: 20.current_county_y1 omitted because of collinearity
    note: 21.current_county_y1 omitted because of collinearity
    note: 22.current_county_y1 omitted because of collinearity
    note: 23.current_county_y1 omitted because of collinearity
    note: 24.current_county_y1 omitted because of collinearity
    note: 25.current_county_y1 omitted because of collinearity
    note: 26.current_county_y1 omitted because of collinearity
    note: 27.current_county_y1 omitted because of collinearity
    note: 28.current_county_y1 omitted because of collinearity
    note: 29.current_county_y1 omitted because of collinearity
    note: 30.current_county_y1 omitted because of collinearity
    note: 10.year omitted because of collinearity
    note: 3.own_education_y omitted because of collinearity
    note: 4.own_education_y omitted because of collinearity
    note: 5.own_education_y omitted because of collinearity
    note: 6.own_education_y omitted because of collinearity
    Fixed-effects (within) regression               Number of obs      =      1578
    Group variable: id                              Number of groups   =       635
    R-sq:  within  = 0.0066                         Obs per group: min =         1
           between = 0.0062                                        avg =       2.5
           overall = 0.0047                                        max =         3
                                                    F(3,28)            =      4.34
    corr(u_i, Xb)  = -0.0538                        Prob > F           =    0.0124
                                                     (Std. Err. adjusted for 29 clusters in current_county_y1)
                                             |               Robust
                             binary_health_y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                psum_unemployed_total_cont_y |  -.0108672   .0043554    -2.50   0.019    -.0197888   -.0019456
                     calt3_other_children_y0 |          0  (omitted)
                           current_county_y1 |
                                      Cavan  |          0  (omitted)
                                      Clare  |          0  (omitted)
                                       Cork  |          0  (omitted)
                                    Donegal  |          0  (omitted)
                                Dublin City  |          0  (omitted)
                     Dún Laoghaire-Rathdown  |          0  (omitted)
                                     Fingal  |          0  (omitted)
                                     Galway  |          0  (omitted)
                                Galway City  |          0  (omitted)
                                      Kerry  |          0  (omitted)
                                    Kildare  |          0  (omitted)
                                   Kilkenny  |          0  (omitted)
                                      Laois  |          0  (omitted)
                                   Limerick  |          0  (omitted)
                                   Longford  |          0  (omitted)
                                      Louth  |          0  (omitted)
                                       Mayo  |          0  (omitted)
                                      Meath  |          0  (omitted)
                                   Monaghan  |          0  (omitted)
                                     Offaly  |          0  (omitted)
                                  Roscommon  |          0  (omitted)
                                      Sligo  |          0  (omitted)
                               South Dublin  |          0  (omitted)
                            Tipperary North  |          0  (omitted)
                                  Waterford  |          0  (omitted)
                                  Westmeath  |          0  (omitted)
                                    Wexford  |          0  (omitted)
                                    Wicklow  |          0  (omitted)
                                        year |
                                          5  |  -.0879273   .0277405    -3.17   0.004    -.1447511   -.0311034
                                         10  |          0  (omitted)
                             own_education_y |
                      Some secondary school  |          0  (omitted)
               Complete secondary education  |          0  (omitted)
    Some third level education at college..  |          0  (omitted)
    Complete third level education at col..  |          0  (omitted)
                                       age_y |   .0074049   .0038814     1.91   0.067    -.0005457    .0153555
                                       _cons |   .6171357   .0901617     6.84   0.000     .4324479    .8018235
                                     sigma_u |  .35499044
                                     sigma_e |  .35438184
                                         rho |  .50085794   (fraction of variance due to u_i)
    I then remember that one of the variables, education, was only recorded in wave 1, not waves 2 or 3. It's value in waves 2 and 3 is a copy of it's value in wave 1.

    As Stata omitted this variable automatically I figure that no harm is done, however, when I manually remove it from the regression my results are changed, as below:

    . xtreg binary_health_y psum_unemployed_total_cont_y calt3_other_children_y0 i.current_county_y1 i.year age
    > _y if has_y0_questionnaire==1 & has_y5_questionnaire==1 | has_y0_questionnaire==1 & has_y10_questionnaire
    > ==1 | has_y0_questionnaire==1 & has_y5_questionnaire==1 & has_y10_questionnaire==1 | has_y0_questionnaire
    > ==1 & cbmi_y5 !=. & has_y5_questionnaire==0 | has_y0_questionnaire==1 & cbmi_y10 !=. & has_y10_questionna
    > ire==0 | has_y0_questionnaire==1 & cbmi_y5 !=. & has_y5_questionnaire==0 & cbmi_y10 !=. & has_y10_questio
    > nnaire==0 | has_y0_questionnaire==1 & cbmi_y5 !=. & has_y5_questionnaire==1 | has_y0_questionnaire==1 & c
    > bmi_y10 !=. & has_y10_questionnaire==1 | has_y0_questionnaire==1 & cbmi_y5 !=. & has_y5_questionnaire==1 
    > & cbmi_y10 !=. & has_y10_questionnaire==1, cluster (current_county_y1) fe robust 
    note: calt3_other_children_y0 omitted because of collinearity
    note: 3.current_county_y1 omitted because of collinearity
    note: 4.current_county_y1 omitted because of collinearity
    note: 5.current_county_y1 omitted because of collinearity
    note: 6.current_county_y1 omitted because of collinearity
    note: 7.current_county_y1 omitted because of collinearity
    note: 8.current_county_y1 omitted because of collinearity
    note: 9.current_county_y1 omitted because of collinearity
    note: 10.current_county_y1 omitted because of collinearity
    note: 11.current_county_y1 omitted because of collinearity
    note: 12.current_county_y1 omitted because of collinearity
    note: 13.current_county_y1 omitted because of collinearity
    note: 14.current_county_y1 omitted because of collinearity
    note: 15.current_county_y1 omitted because of collinearity
    note: 16.current_county_y1 omitted because of collinearity
    note: 17.current_county_y1 omitted because of collinearity
    note: 18.current_county_y1 omitted because of collinearity
    note: 19.current_county_y1 omitted because of collinearity
    note: 20.current_county_y1 omitted because of collinearity
    note: 21.current_county_y1 omitted because of collinearity
    note: 22.current_county_y1 omitted because of collinearity
    note: 23.current_county_y1 omitted because of collinearity
    note: 24.current_county_y1 omitted because of collinearity
    note: 25.current_county_y1 omitted because of collinearity
    note: 26.current_county_y1 omitted because of collinearity
    note: 27.current_county_y1 omitted because of collinearity
    note: 28.current_county_y1 omitted because of collinearity
    note: 29.current_county_y1 omitted because of collinearity
    note: 30.current_county_y1 omitted because of collinearity
    note: 10.year omitted because of collinearity
    Fixed-effects (within) regression               Number of obs      =      1590
    Group variable: id                              Number of groups   =       641
    R-sq:  within  = 0.0063                         Obs per group: min =         1
           between = 0.0064                                        avg =       2.5
           overall = 0.0049                                        max =         3
                                                    F(3,28)            =      4.38
    corr(u_i, Xb)  = -0.0423                        Prob > F           =    0.0119
                                         (Std. Err. adjusted for 29 clusters in current_county_y1)
                                 |               Robust
                 binary_health_y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    psum_unemployed_total_cont_y |  -.0101514   .0042467    -2.39   0.024    -.0188504   -.0014524
         calt3_other_children_y0 |          0  (omitted)
               current_county_y1 |
                          Cavan  |          0  (omitted)
                          Clare  |          0  (omitted)
                           Cork  |          0  (omitted)
                        Donegal  |          0  (omitted)
                    Dublin City  |          0  (omitted)
         Dún Laoghaire-Rathdown  |          0  (omitted)
                         Fingal  |          0  (omitted)
                         Galway  |          0  (omitted)
                    Galway City  |          0  (omitted)
                          Kerry  |          0  (omitted)
                        Kildare  |          0  (omitted)
                       Kilkenny  |          0  (omitted)
                          Laois  |          0  (omitted)
                       Limerick  |          0  (omitted)
                       Longford  |          0  (omitted)
                          Louth  |          0  (omitted)
                           Mayo  |          0  (omitted)
                          Meath  |          0  (omitted)
                       Monaghan  |          0  (omitted)
                         Offaly  |          0  (omitted)
                      Roscommon  |          0  (omitted)
                          Sligo  |          0  (omitted)
                   South Dublin  |          0  (omitted)
                Tipperary North  |          0  (omitted)
                      Waterford  |          0  (omitted)
                      Westmeath  |          0  (omitted)
                        Wexford  |          0  (omitted)
                        Wicklow  |          0  (omitted)
                            year |
                              5  |  -.0840781   .0273625    -3.07   0.005    -.1401278   -.0280285
                             10  |          0  (omitted)
                           age_y |   .0067429   .0038121     1.77   0.088    -.0010659    .0145516
                           _cons |   .6316924    .088782     7.12   0.000     .4498308    .8135541
                         sigma_u |   .3549904
                         sigma_e |  .35478865
                             rho |  .50028423   (fraction of variance due to u_i)

    I don't know why this is the case, I thought the fixed effects model was automatically omitting the variable, based on the first output where it said the variable was omitted?

    My questions are:
    1. Why do my results change?
    2. Was this variable adding something to the analysis (even though it appeared only in Wave 1, I noticed that categories 3, 4, 5 and 6 were omitted, while categories 1 and 2 remained).
    3. Should I keep this variable or manually remove it from my regression, and why?
    Thank you for any guidance!

  • #2
    Hi John,
    two things:
    1. the reason why the results change is because both regressions have different sample sizes. 1578 vs 1590. This happens because conditional on your first specification, you have 1578 observations with fully observed data. Even if a variable is omitted, Stata attempts to keep the specification as consistent as possible, using the same sample size, even though there are more observations after a variable was omitted. That doesn't happen when you omit it manually.
    2. current_county_y seems to be constant within a group, so you can also exclude it from your regression. That is why all those coefficients are omitted from your results.


    • #3
      Hi Fernando,

      Thank you for your response,

      So, do you think I should manually remove education, or let Stata automatically remove it, or does it make any difference?

      Can you explain why the first two categories of education are not listed as omitted? My theory is that the numbers for these are so low (13 people) that the controls removed anyone who would have been in these groups before they could be omitted, am I right?

      Why is calt3_other_children_y0 removed entirely from the analysis without a number before it like education (3.own_education_y, 4.own_education_y, etc.,), is it because I don't have an i. before it, like I do with education?

      All the best,



      • #4

        1. "So, do you think I should manually remove education, or let Stata automatically remove it, or does it make any difference?"

        I would say yes, remove it manually.

        2. "Can you explain why the first two categories of education are not listed as omitted? My theory is that the numbers for these are so low (13 people) that the controls removed anyone who would have been in these groups before they could be omitted, am I right?"

        Probably because one of those categories is not observed for the sample used in the estimation, and the other is used as the base group.
        you can probably see that by doing:
        tab own_education_y if e(sample)==1

        3. "Why is calt3_other_children_y0 removed entirely from the analysis without a number before it like education (3.own_education_y, 4.own_education_y, etc.,), is it because I don't have an i. before it, like I do with education?"

        Because, unless you specify it, variables are considered as continuous instead of categorical.


