Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interaction with time-invariant variable in xtreg,fe - problem with collinearity

    Dear Statalist,

    I am estimating a fixed effects regression with xtreg,fe that includes an interaction between a time-invariant variable (female) and a time-varying variable (divorced).
    The dependent variable is self-esteem.
    I have omitted the main effect of female as described in the thread below, but Stata is still throwing out a variable due to collinearity.

    https://www.statalist.org/forums/for...fe-and-margins

    Does anyone know what is wrong? I have run out of things to check . . .

    The crosstabs below the regression results seem to indicate that female and divorce are not completely collinear.
    They also confirm that female is time-invariant and that divorce varies over time.
    The DV has a limited number of different values, but altering that by adding noice (gen esteem_w2 = esteem_w + rnormal()) did not fix the problem.

    Note: To focus on the effect of the first divorce, I have followed recommendations by Bruederl.
    I dropped people who were divorced in wave 1 as well as person-years after the end of the first divorce spell.

    Thanks,

    Jeremy



    Code:
    . xtreg esteem_w i.divorced divorced#i.female, fe 
    note: 1.divorced_w#1.female omitted because of collinearity
    
    Fixed-effects (within) regression               Number of obs     =      6,145
    Group variable: id                              Number of groups  =      1,761
    
    R-sq:                                           Obs per group:
         within  = 0.0017                                         min =          2
         between = 0.0085                                         avg =        3.5
         overall = 0.0068                                         max =          5
    
                                                    F(2,4382)         =       3.77
    corr(u_i, Xb)  = -0.0776                        Prob > F          =     0.0230
    
    -----------------------------------------------------------------------------------
             esteem_w |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    ------------------+----------------------------------------------------------------
         1.divorced_w |      0.038      0.056    0.679   0.497       -0.072       0.149
                      |
    divorced_w#female |
                 0 1  |      0.170      0.075    2.266   0.023        0.023       0.316
                 1 1  |      0.000  (omitted)
                      |
                _cons |      1.419      0.041   34.782   0.000        1.339       1.499
    ------------------+----------------------------------------------------------------
              sigma_u |  .46732402
              sigma_e |   .4427322
                  rho |   .5270026   (fraction of variance due to u_i)
    -----------------------------------------------------------------------------------
    F test that all u_i=0: F(1760, 4382) = 3.57                  Prob > F = 0.0000
    
    . 
    end of do-file
    
    . do "C:\Users\reyno113\AppData\Local\Temp\STD3030_000000.tmp"
    
    . tab divorced female if e(sample)
    
               |        female
    divorced_w |         0          1 |     Total
    -----------+----------------------+----------
             0 |     2,705      3,152 |     5,857 
             1 |       119        169 |       288 
    -----------+----------------------+----------
         Total |     2,824      3,321 |     6,145 
    
    
    
    . xttrans female if e(sample)
    
               |        female
        female |         0          1 |     Total
    -----------+----------------------+----------
             0 |    100.00       0.00 |    100.00 
             1 |      0.00     100.00 |    100.00 
    -----------+----------------------+----------
         Total |     46.28      53.72 |    100.00 
    
    . 
    end of do-file
    
    . do "C:\Users\reyno113\AppData\Local\Temp\STD3030_000000.tmp"
    
    . xttrans divorced if e(sample)
    
               |      divorced_w
    divorced_w |         0          1 |     Total
    -----------+----------------------+----------
             0 |     95.50       4.50 |    100.00 
             1 |      0.00     100.00 |    100.00 
    -----------+----------------------+----------
         Total |     93.43       6.57 |    100.00 


  • #2
    Thanks to some off-line help from Shawn Bauldry, I have learned the following:

    The examples in the thread below involve an interaction where one variable is a time-varying ratio variable (c.tvvar) and the other is a time-invariant categorical variable (i.tivar).

    https://www.statalist.org/forums/for...fe-and-margins

    In that case it works to write:

    xtreg DV c.tvvar c.tvvar#i.tivar, fe

    It does not work to write:

    xtreg DV c.tvvar##i.tivar, fe

    However, when both variables are categorical: one time-varying (i.tvvar) and one time-invariant (i.tivar), you have to ask Stata to try including the time-invariant variable so that it throws out the proper variable due to collinearity. In other words, either of the approaches below works.

    xtreg DV i.tvvar i.tivar i.tvvar#i.tivar, fe
    xtreg DV i.tvvar##i.tivar, fe

    It does not work to write:

    xtreg DV i.tvvar i.tvvar#i.tivar, fe

    The code that works for the two categorical variables, however, is only a partial solution because margins does not work properly after the model is estimated.

    Jeremy

    Comment


    • #3
      After a closer look at the help file for factor variables (see 11.4.3.4 Selecting levels), I discovered that the problem was not the data or that Stata's factor variable system gets confused when you ask it to make an interaction without the base terms. Rather, the problem was my misunderstanding of how the # operator works.

      The simple answer is this: If you want to interact a time-varying dichotomous variable with a time-invariant dichotomous variable in a fixed effects model, you need to tell Stata that you want to add a single indicator variable that is coded 1 when both variables are equal to one and is zero otherwise. This cannot be done by adding an expression like i.var#i.var. Rather, you need to add an expression like 1.var#1.var. For those who are interested, I have written a more detailed explanation with examples below.

      Jeremy

      It turns out that an expression like i.employed#i.woman can function in two different ways.

      If the expression is included in a regression with the main effects (i.employed and i.women) it tells Stata to make one extra variable that is equal to 1 if the person is both employed and a woman and zero otherwise. This is what we want when trying to create an interaction.

      If the main effects are not included, the expression i.employed#i.woman functions differently. As explained in the help file, it tells Stata to make dummy variables to represent each possible combination of the two variables. In this case there would be four indicator variables to represent: employed women, employed men, non-employed women, and non-employed men.

      So what is a person to do if you want the first behavior but the main effects cannot be included as in a fixed effect model? The user must use a number rather than i. to specify the particular levels of each variable the respondent must have for the new variable to be coded 1. For example, 1.employed#1.woman will make a single indicator that is 1 when employed = 1 and woman = 1. It will be coded 0 otherwise. This also works if the variable has multiple categories. For instance, 2.race#1.woman would make a single indicator that is 1 when race = 2 and woman = 1. It will be coded 0 otherwise.

      In the examples below, I illustrate this by estimating fixed-effects regressions of log wages on age, age-squared, and a time-varying dummy variable (south = whether R currently lives in the south). Each example also includes a different attempt to add an interaction between south and a time-invariant dummy variable (white = whether R is white).

      Code:
      use http://www.stata-press.com/data/r13/nlswork, clear
      
      *make a time-invariant dummy variable 
      recode race (1=1 "white")(2 3=0 "not white"), gen(white)
      
      *make an interaction term by hand
      gen southwhite = south*white
      
      *The within output verifies that ln_wage, age, and south vary over time but white does not.
      xtsum ln_wage age south white southwhite
      
      *A reference model.
      /*
      Here, the interation is made by hand.
      This is fine, but calculating predicted values afterwards can be challenging due to the
      inclusion of age and age squared and the interaction between south and white
      (See Williams Stata Journal 12(2), 2012)
      */
      xtreg ln_wage c.age c.age#c.age i.south southwhite, fe
      
      
      *Attempt 1. 
      /* 
      This does not work because without the main effect of white, the expression i.south#i.white
      creates two dummy variables: one to represent whites who are not in the south and one to
      represent whites who are in the south.  The second variable, which is the one we want in
      the regression, gets thrown out due to collinearity.  Comparing the results to the reference
      model, we see that the included variable has the right magnitude but the wrong sign. 
      The intercept is thus also different. 
      */
      xtreg ln_wage c.age c.age#c.age i.south i.south#i.white, fe
      
      
      *Attempt 2 and 3. 
      /* 
      These work but only partially and only due to Stata's efforts to correct mistakes.  
      They explicitly ask Stata to include the main effect of white.  Stata throws out that
      variable because fixed-effects models cannot include the main effects of time-invariant
      variables.  The variables that remain happen to be the ones we want, but margins
      cannot help us understand the interaction between south and white because the
      model did not include a coefficient for white.
      */
      xtreg ln_wage c.age c.age#c.age i.south i.white i.south#i.white , fe
      xtreg ln_wage c.age c.age#c.age i.south##i.white , fe
      margins south#white  // Margins does not produce any estimates.
      
      *Attempt 4.  (This is the proper solution.)
      /* 
      Note that the i prefix is replaced with a number.  The code tells Stata to make a
      single indicator variable that is equal to 1 when south and white are both 1 and is
      zero otherwise.  This matches the results from the model where the interaction
      was made by hand.  It also has the added advantage of allowing us to use the
      margins command to calculate predictions.
      */
      xtreg ln_wage c.age c.age#c.age i.south 1.south#1.white , fe
      margins south#white
      marginsplot, recast(dot)

      Comment

      Working...
      X