Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Base Level of Factor Variable in Probit with Interaction

    Statalist:

    I am running the following Probit model on pooled panel data:
    \[ P(y_{it} = 1 \vert x_{it}) = \Phi(b_0 + \delta_t + b_1 \varepsilon_t \times x_{it} + b_2 x_{it}) \] The RHS includes two terms: (i) an interaction term between the individual-specific, binary variable x and a variable epsilon that is common across all observations in each time period, i.e., a macro variable; and (ii) time fixed-effects, denoted delta. Suppose I want to estimate exactly this specification, with x = 0 as base value. Then Stata correctly returns the coefficient b2 which indicates a level shift if x = 1. However, the interaction effect b1 captures the additional elasticity of y w.r.t. to x if x = 0, i.e., the wrong base level. Here is a cooked-up example:

    Code:
    webuse union, clear
    
    * generate variable common across all observations in a given year
    gen vareps = rnormal()
    by year, sort: replace vareps = vareps[1]
    
    drop if grade<4
    
    * model with both level terms and interaction
    probit union c.vareps##ib0.black, coeflegend
    
    --------------------------------------------------------------------------------
             union |      Coef.  Legend
    ---------------+----------------------------------------------------------------
            vareps |   .0253084  _b[vareps]
           1.black |   .2894273  _b[1.black]
                   |
    black#c.vareps |
                1  |  -.0096174  _b[1.black#c.vareps]
                   |
             _cons |  -.8508753  _b[_cons]
    --------------------------------------------------------------------------------
    
    * model with level of black and interaction with vareps plus year fixed-effects
    probit union ib0.black c.vareps#ib0.black i.year, coeflegend
    
    --------------------------------------------------------------------------------
             union |      Coef.  Legend
    ---------------+----------------------------------------------------------------
           1.black |   .2884075  _b[1.black]
                   |
    black#c.vareps |
                0  |   .0112105  _b[0b.black#c.vareps]
                1  |          0  _b[1o.black#co.vareps]
                   |
             _cons |  -.8902326  _b[_cons]
    --------------------------------------------------------------------------------
    Note that as long as I allow for the level of varepsilon to enter the model, the base categories are correct (obviously, I cannot have time fixed-effects in this case).

    Any hints would be greatly appreciated!

    Thanks,
    Peter
    Last edited by Peter Zorn; 25 Jan 2019, 03:43.

  • #2
    So there seems to be a problem with your output, in that it doesn't match the commands you've shown for the second model. Here I recreate the two models with the appropriate output, without fixed-effects of year.

    Code:
    set seed 42
    webuse union, clear
    gen vareps = rnormal()
    by year, sort: replace vareps = vareps[1]
    drop if grade<4
    
    * Model 1
    . probit union c.vareps##ib0.black  , coefl allbase nolog nohead
    --------------------------------------------------------------------------------
             union |      Coef.  Legend
    ---------------+----------------------------------------------------------------
            vareps |   -.017862  _b[vareps]
                   |
             black |
                0  |          0  _b[0b.black]
                1  |   .2953557  _b[1.black]
                   |
    black#c.vareps |
                0  |          0  _b[0b.black#co.vareps]
                1  |  -.0480283  _b[1.black#c.vareps]
                   |
             _cons |  -.8487032  _b[_cons]
    --------------------------------------------------------------------------------
    
    * model 2
    . probit union c.vareps#ib0.black ib0.black , coefl allbase nolog nohead
    --------------------------------------------------------------------------------
             union |      Coef.  Legend
    ---------------+----------------------------------------------------------------
             black |
                0  |          0  _b[0b.black]
                1  |   .2953557  _b[1.black]
                   |
    black#c.vareps |
                0  |   -.017862  _b[0b.black#c.vareps]
                1  |  -.0658904  _b[1.black#c.vareps]
                   |
             _cons |  -.8487032  _b[_cons]
    --------------------------------------------------------------------------------
    In model 1, the factorial interaction operator (##) is used, meaning Stata inserts the lower level terms for you. So c.vareps##ib0.black becomes c.vareps ib0.black and c.vareps#ib0.black (note the simple interaction operator, #). In model 2, you have used only the simple interaction term (using #) with the factor variable black. In both models, the constant and coefficient for black (b2 in your notation) retain the same meaning, and the interpretation for b2 is the level shift from x=0 to x=1.

    With respect to the interaction terms, in model 1, this is exactly 0 when black==0 (_b[0b.black#co.vareps]) and the co.vareps tells Stata to omit vareps because a main effect of vareps is already included. In other words, Stata must drop one of the terms because they are collinear.

    However, in model 2, the main effect of c.vareps is no longer included, but since the interaction term includes c.vareps, its effect gets included. You can see this by noticing that the effect of c.vareps from model 1 is added to the interaction of model 2 (across both levels of black, in this case). In essence, these fit the same model with different parameterizations.

    In your model, assuming that black is coded as 0 or 1, then the interaction term reduces to a simple effect of epsilon when x==1: \[ b_1 \varepsilon_t \], which is not quite the same mode you wish to fit. To get that model, you can start with model 2 and add an instruction to omit the effect of vareffect, co.vareps.

    Code:
    * model 3 (the one you requested)
    . probit union ib0.black co.vareps c.vareps#ib0.black , coefl allbase nolog nohead
    --------------------------------------------------------------------------------
             union |      Coef.  Legend
    ---------------+----------------------------------------------------------------
             black |
                0  |          0  _b[0b.black]
                1  |   .2976062  _b[1.black]
                   |
            vareps |          0  _b[o.vareps]
                   |
    black#c.vareps |
                0  |          0  _b[0b.black#co.vareps]
                1  |  -.0658904  _b[1.black#c.vareps]
                   |
             _cons |  -.8509537  _b[_cons]
    --------------------------------------------------------------------------------
    As to the interpretation of your interaction term, that depends on which model you ultimately fit and what your actual question is. See here for more information.
    Last edited by Leonardo Guizzetti; 25 Jan 2019, 21:04.

    Comment


    • #3
      Leonardo,

      all of what you write is of course true! Thank you very much for your careful answer.

      It is a bit weird because my understanding of factor variables was that STATA by default omits the base levels. Here, this is not the case because the specification drops the main effect of vareps but STATA automatically includes it, broken down by each category of x (i.e., black in my example). Given that I have time fixed-effects, this leads to a multicollinearity problem.

      That being said, using the "co." operator on vareps works like a charm, and instructs STATA to only return the coefficient on the interaction term.

      Thank you very much for your help!

      All the best,
      Peter

      Comment

      Working...
      X