Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Omitted variables in fixed-effects-model with interaction-term

    Hi,

    I am using stata 12 and currently have the following regression output using a fixed effects estimator shown below. I have an interaction term between two categorical variables. The first categorical variable ranges from 0 to 2, the second is dichotomous and switch from 0 to 1. Stata automatically omitts two possible interaction-combinations, although the variables alone are not omitted because of collinearity.

    Can anyone help me why this is the case and how to deal with it?

    Code:
    . xtreg zufr ehe_kat#eltern ehe_kat eltern if sex==0, fe  // alle
    note: 0b.ehe_kat#1.eltern omitted because of collinearity
    note: 2.ehe_kat#0b.eltern omitted because of collinearity
    
    Fixed-effects (within) regression               Number of obs      =     48798
    Group variable: id                              Number of groups   =      7978
    
    R-sq:  within  = 0.0035                         Obs per group: min =         2
           between = 0.0061                                        avg =       6.1
           overall = 0.0031                                        max =        19
    
                                                    F(5,40815)         =     28.73
    corr(u_i, Xb)  = -0.0407                        Prob > F           =    0.0000
    
    --------------------------------------------------------------------------------
              zufr |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    ---------------+----------------------------------------------------------------
    ehe_kat#eltern |
              0 1  |          0  (omitted)
              1 0  |   .1214574   .0345828     3.51   0.000     .0536744    .1892405
              1 1  |   .2465575   .0529449     4.66   0.000     .1427844    .3503306
              2 0  |          0  (omitted)
              2 1  |  -.0315965   .0683041    -0.46   0.644     -.165474     .102281
                   |
           ehe_kat |  -.1620725   .0236818    -6.84   0.000    -.2084894   -.1156557
            eltern |  -.2207379   .0346932    -6.36   0.000    -.2887373   -.1527385
             _cons |   7.336172   .0242491   302.53   0.000     7.288643    7.383701
    ---------------+----------------------------------------------------------------
           sigma_u |  1.3171201
           sigma_e |  1.1824423
               rho |  .55372455   (fraction of variance due to u_i)
    --------------------------------------------------------------------------------
    F test that all u_i=0:     F(7977, 40815) =     6.57         Prob > F = 0.0000
    Last edited by Madeleine Bear; 02 Jul 2015, 07:53.

  • #2
    Did you have a look at the summary statistics for the interaction of ehe_kat and eltern if sex==0? Maybe you can post them

    Comment


    • #3
      I'm not sure if I understand your question the right why, do you mean something like this?

      Code:
      . sum ehe_kat#eltern
      
          Variable |       Obs        Mean    Std. Dev.       Min        Max
      -------------+--------------------------------------------------------
           ehe_kat#|
            eltern |
              0 1  |     96779    .4441356    .4968719          0          1
              1 0  |     96779     .101086    .3014441          0          1
              1 1  |     96779    .0413003    .1989849          0          1
              2 0  |     96779    .1445872    .3516859          0          1
      -------------+--------------------------------------------------------
              2 1  |     96779    .0240651    .1532522          0          1
      
      .

      Comment


      • #4
        Almost: Sum ehe_kat#eltern if sex==0

        Comment


        • #5
          Oh, sorry.
          Code:
          . sum ehe_kat#eltern if sex==0
          
              Variable |       Obs        Mean    Std. Dev.       Min        Max
          -------------+--------------------------------------------------------
               ehe_kat#|
                eltern |
                  0 1  |     48798    .4398131    .4963694          0          1
                  1 0  |     48798    .1002705    .3003634          0          1
                  1 1  |     48798    .0414976    .1994401          0          1
                  2 0  |     48798    .1313169      .33775          0          1
          -------------+--------------------------------------------------------
                  2 1  |     48798    .0440182    .2051377          0          1
          
          .

          Comment


          • #6
            Does somebody has any idea?

            Comment


            • #7
              Within each group you have six intercepts. Compare the coefficients/labels you get with

              xtreg zufr ehe_kat#eltern ehe_kat eltern if sex==0, fe
              versus

              xtreg zufr ehe_kat#eltern if sex==0, fe
              Doug Hemken
              SSCC, Univ. of Wisc.-Madison

              Comment


              • #8
                Hi, thanks for your answer. There are some questions with that:

                1) What do you mean with six intercepts?

                2) What is the Interpretation of an interaction-term when there are no main factors in the Regression?

                3) What conclusions can be drawn from the comparison of the coefficients of the Regression with and without main factors?

                4) Why are there no omitted variables in the Regression without main factors?

                Code:
                . xtreg zufr ehe_kat#eltern ehe_kat eltern if sex==0, fe  // alle
                note: 0b.ehe_kat#1.eltern omitted because of collinearity
                note: 2.ehe_kat#0b.eltern omitted because of collinearity
                
                Fixed-effects (within) regression               Number of obs      =     48798
                Group variable: id                              Number of groups   =      7978
                
                R-sq:  within  = 0.0035                         Obs per group: min =         2
                       between = 0.0061                                        avg =       6.1
                       overall = 0.0031                                        max =        19
                
                                                                F(5,40815)         =     28.73
                corr(u_i, Xb)  = -0.0407                        Prob > F           =    0.0000
                
                --------------------------------------------------------------------------------
                          zufr |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                ---------------+----------------------------------------------------------------
                ehe_kat#eltern |
                          0 1  |          0  (omitted)
                          1 0  |   .1214574   .0345828     3.51   0.000     .0536744    .1892405
                          1 1  |   .2465575   .0529449     4.66   0.000     .1427844    .3503306
                          2 0  |          0  (omitted)
                          2 1  |  -.0315965   .0683041    -0.46   0.644     -.165474     .102281
                               |
                       ehe_kat |  -.1620725   .0236818    -6.84   0.000    -.2084894   -.1156557
                        eltern |  -.2207379   .0346932    -6.36   0.000    -.2887373   -.1527385
                         _cons |   7.336172   .0242491   302.53   0.000     7.288643    7.383701
                ---------------+----------------------------------------------------------------
                       sigma_u |  1.3171201
                       sigma_e |  1.1824423
                           rho |  .55372455   (fraction of variance due to u_i)
                --------------------------------------------------------------------------------
                F test that all u_i=0:     F(7977, 40815) =     6.57         Prob > F = 0.0000
                
                .
                .
                end of do-file
                
                . do "C:\Users\Ariane\AppData\Local\Temp\STD01000000.tmp"
                
                . xtreg zufr ehe_kat#eltern if sex==0, fe
                
                Fixed-effects (within) regression               Number of obs      =     48798
                Group variable: id                              Number of groups   =      7978
                
                R-sq:  within  = 0.0035                         Obs per group: min =         2
                       between = 0.0061                                        avg =       6.1
                       overall = 0.0031                                        max =        19
                
                                                                F(5,40815)         =     28.73
                corr(u_i, Xb)  = -0.0407                        Prob > F           =    0.0000
                
                --------------------------------------------------------------------------------
                          zufr |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                ---------------+----------------------------------------------------------------
                ehe_kat#eltern |
                          0 1  |  -.2207379   .0346932    -6.36   0.000    -.2887373   -.1527385
                          1 0  |  -.0406151   .0381178    -1.07   0.287    -.1153269    .0340966
                          1 1  |  -.1362529    .056375    -2.42   0.016    -.2467492   -.0257566
                          2 0  |  -.3241451   .0473636    -6.84   0.000    -.4169788   -.2313113
                          2 1  |  -.5764794   .0598063    -9.64   0.000     -.693701   -.4592578
                               |
                         _cons |   7.336172   .0242491   302.53   0.000     7.288643    7.383701
                ---------------+----------------------------------------------------------------
                       sigma_u |  1.3171201
                       sigma_e |  1.1824423
                           rho |  .55372455   (fraction of variance due to u_i)
                --------------------------------------------------------------------------------
                F test that all u_i=0:     F(7977, 40815) =     6.57         Prob > F = 0.0000
                
                .

                Comment


                • #9
                  Could someone help me with that? Any idea?

                  Comment


                  • #10
                    First, you need
                    xtreg zufr ehe_kat#eltern i.ehe_kat eltern if sex==0, fe
                    because ehe_kat has three values, not just 0/1. This might be clearer if you just specified
                    xtreg zufr ehe_kat##eltern if sex==0, fe
                    which assumes both ehe_kat and eltern are factor variables. In your formulation, ehe_kat is both a factor variable and a continuous variable.

                    If you use factor variable notation throughout, you will have cleaner, easier-to-interpret output. So either
                    xtreg zufr ehe_kat##eltern if sex==0, fe
                    or
                    xtreg zufr ehe_kat#eltern i.ehe_kat i.eltern if sex==0, fe


                    You have six intercepts:
                    _cons (where ehe_kat=0 and eltern=0)
                    _cons + 1.ehe_kat (where ehe_kat=1 and eltern = 0)
                    _cons + 2.ehe_kat (where ehe_kat=2 and eltern = 0)
                    _cons + 1.eltern (where ehe_kat=0 and eltern = 1)
                    _cons + 1.ehe_kat + 1.eltern + 1.ehe_kat#1.eltern
                    _cons + 2.ehe_kat + 1.eltern + 2.ehe+kat#1.eltern

                    Mixing notation (continuous and factor specification of the same variable) or dropping terms gives you "alternative parameterizations" - the same model expressed in other terms. You can only have six intercepts here, so if your regression code specifies an alternative parameterization, something will be dropped if your parameterization appears to ask for more than that.
                    Last edited by Doug Hemken; 06 Jul 2015, 11:32. Reason: Edit to clarify: specifying a variable as both continuous and as a factor can sometimes result in an alternative parameterization (where the variable is coded 0/1), or it can just be wrong, as in thi
                    Doug Hemken
                    SSCC, Univ. of Wisc.-Madison

                    Comment


                    • #11
                      Thank you very much! That helped me a lot!

                      Comment


                      • #12
                        Hi Statalist.

                        Stata provided an omitted result from the combination of terms [1 4] in an interaction between a four-categorical (rel) variable and a 0/1 dummy (at1) - the output states this is due to collinearity, (Note, using Stata v.15.1).

                        Code:
                        . stcox i.at1##i.rel, noshow nolog allbaselevels
                        note: 1.at1#1.rel identifies no observations in the sample
                        note: 1.at1#4.rel omitted because of collinearity
                        
                        Cox regression -- Breslow method for ties
                        
                        No. of subjects =        4,931                  Number of obs    =      10,272
                        No. of failures =           80
                        Time at risk    =        10447
                                                                        LR chi2(6)       =       39.19
                        Log likelihood  =   -584.61365                  Prob > chi2      =      0.0000
                        
                        ------------------------------------------------------------------------------
                                  _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
                        -------------+----------------------------------------------------------------
                                 at1 |
                                  0  |          1  (base)
                                  1  |   .2434924    .118355    -2.91   0.004      .093916    .6312934
                                     |
                                 rel |
                                  1  |          1  (base)
                                  2  |   .6208963    .179913    -1.64   0.100     .3518617    1.095636
                                  3  |   .2335054   .1252064    -2.71   0.007     .0816356    .6679043
                                  4  |   .5487337    .221485    -1.49   0.137     .2487664    1.210407
                                     |
                             at1#rel |
                                0 1  |          1  (base)
                                0 2  |          1  (base)
                                0 3  |          1  (base)
                                0 4  |          1  (base)
                                1 1  |          1  (empty)
                                1 2  |   1.511419   1.712837     0.36   0.716     .1639639    13.93225
                                1 3  |   6.942113   5.365279     2.51   0.012     1.526274    31.57555
                                1 4  |          1  (omitted)
                        ------------------------------------------------------------------------------
                        but this does not occur if I run
                        Code:
                        . stcox i.at1#i.rel, noshow nolog allbaselevels
                        note: 1.at1#1.rel identifies no observations in the sample
                        
                        Cox regression -- Breslow method for ties
                        
                        No. of subjects =        4,931                  Number of obs    =      10,272
                        No. of failures =           80
                        Time at risk    =        10447
                                                                        LR chi2(6)       =       39.19
                        Log likelihood  =   -584.61365                  Prob > chi2      =      0.0000
                        
                        ------------------------------------------------------------------------------
                                  _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
                        -------------+----------------------------------------------------------------
                             at1#rel |
                                0 1  |          1  (base)
                                0 2  |   .6208963    .179913    -1.64   0.100     .3518617    1.095636
                                0 3  |   .2335054   .1252064    -2.71   0.007     .0816356    .6679043
                                0 4  |   .5487337    .221485    -1.49   0.137     .2487664    1.210407
                                1 1  |          1  (empty)
                                1 2  |   .2285016   .2326992    -1.45   0.147     .0310492    1.681619
                                1 3  |   .3947063   .1519148    -2.42   0.016     .1856364    .8392378
                                1 4  |   .1336125   .0514121    -5.23   0.000     .0628517    .2840383
                        ------------------------------------------------------------------------------
                        I have included the results for correlation and tabulation for information.
                        Code:
                        . corr at1 rel
                        (obs=10,955)
                                     |      at1      rel
                        -------------+------------------
                                 at1 |   1.0000
                                 rel |   0.6890   1.0000
                        
                        
                        . tab at1 rel
                                   |                     rel
                               at1 |         1          2          3          4 |     Total
                        -----------+--------------------------------------------+----------
                                 0 |     2,120      2,055        928        765 |     5,868
                                 1 |         0        259      1,209      3,619 |     5,087
                        -----------+--------------------------------------------+----------
                             Total |     2,120      2,314      2,137      4,384 |    10,955
                        Thoughts / suggestions appreciated.
                        Last edited by Chris Boulis; 28 Oct 2020, 21:04.

                        Comment


                        • #13
                          This is the same phenomenon as was discussed at https://www.statalist.org/forums/for...-two-variables. You will notice that in both ways of doing this regression you get 1+3+2 = 6 non-omitted coefficients for at1, rel, and at1#rel in the first regression, and 3 + 3 = 6 for at1#rel in the second regression. It is, once again, just a different way of re-parameterizing the same model. Again, if you use -predict-, you will see that the two models produce exactly the same predicted values.The only new wrinkle here is that 1.at1#1.rel is additional omitted because that combination never actually appears in the data. But otherwise this is the same story.

                          Comment


                          • #14
                            Thank you Clyde Schechter. Sorry I've not used -predict- but will look into it. I have used -margins- and know that running it after using ## will provide the same results as when using #. I don't understand what you mean by:
                            The only new wrinkle here is that 1.at1#1.rel is additional omitted because that combination never actually appears in the data
                            I understand that [0 1 ... 4] are the base values for the corresponding [1 1 .. 4]. We see that [1 1] is empty as there are no observations, but [1 4] has 3,619 observations. (Sorry, I thought the omitted result saw it better align with this thread).

                            Comment


                            • #15
                              On review of #13, I understand your point that [1 1] is empty as this combination does not appear in the data. My key question in #12 is that even though [1 4] has the largest number of observations of all combinations in the data, it has been omitted (using ## in the interaction, not #). What am I missing?

                              Sample data:
                              Code:
                              * Example generated by -dataex-. To install: ssc install dataex
                              clear
                              input long(id p_id) byte(wave at1 rel)
                              104  115  4 0 4
                              104  115  7 0 4
                              104  115 10 0 4
                              108  119  4 0 2
                              108  119  7 0 2
                              108  119 10 0 2
                              108  119 14 0 2
                              103  124  4 1 4
                              103  124  7 1 4
                              103  124 10 1 4
                              103  124 14 1 4
                              103  124 18 1 4
                              106 135 10 1 4
                              106 135 14 1 4
                              106 135 18 1 4
                              102  143  4 0 3
                              102  143  7 0 2
                              102  143 10 0 3
                              102  143 18 0 3
                              178  179  4 0 1
                              178  179  7 0 1
                              178  179 10 0 1
                              182  183  7 0 2
                              182  183 10 0 2
                              182  183 18 0 2
                              end

                              Comment

                              Working...
                              X