Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Factor variable notation in -xthtaylor-

    Hi everyone

    The Hausman-Taylor estimator -xthtaylor- has been discussed in a number of posts over the years. In post#2 in https://www.statalist.org/forums/for...lain-estimator from 2015, Daniel mentioned that -xthtaylor- does not allow for factor notation.

    It would appear that the use of factor notation is still not allowed in -xthtaylor- in Stata 16.

    Code:
    . use https://www.stata-press.com/data/r16/psidextract
    
    . xthtaylor lwage occ south smsa ind exp exp2 wks ms union fem blk ed, endog(exp exp2 wks ms union ed)
    ** works fine
    
    ** but if we were to use factor notations for the variables which are, in fact, categorical
    . xthtaylor lwage i.occ i.south smsa i.ind exp exp2 wks ms union fem blk ed, endog(exp exp2 wks ms i.union ed)
    
    factor-variable operators not allowed
    r(101);
    My question is: is the inability of -xthtaylor- to incorporate factor notation something inherent in the theoretical foundation of the estimator (the original paper is rather beyond my skill levels at the moment, I'm afraid)? Or is this due to some technical/computational issue such that Stata can only allow continuous variables in the Hausman-Taylor estimator?

    Furthermore, whilst it does not make too much of a difference in the above example because those are binary variables, what if I were to run a variable such as 'mode of transportation' (with values of bus, car or train)? Should I create individual binary variable for 'bus', 'car' and 'train' (but leave one out in the regression to avoid dummy variable trap)?

    Thank you.

  • #2
    Hi, in my understanding it is proper to the implementation rather than the econometric theory.

    See related discussion for Hausman-Taylor estimation with year fixed effects https://www.statalist.org/forums/for...-fixed-effects
    I think you could use M number of fixed effects with Hausman Taylor model and treat them as exogenous but I cannot affirm this statement.

    The workaround towards the
    Code:
    factor-variable operators not allowed
    error, would be to create the dummy matrix prior model estimation.

    An example with time fixed effects (time dummies are already in the dataset):

    Code:
    webuse psidextract
    xthtaylor lwage occ south smsa ind exp exp2 wks ms union fem blk ed tdum1 tdum2 tdum3 tdum4 tdum5 tdum6, endog(exp exp2 wks ms union ed)
    /*
    
    Hausman–Taylor estimation                       Number of obs     =      4,165
    Group variable: id                              Number of groups  =        595
    
                                                    Obs per group:
                                                                  min =          7
                                                                  avg =          7
                                                                  max =          7
    
    Random effects u_i ~ i.i.d.                     Wald chi2(18)     =    7659.21
                                                    Prob > chi2       =     0.0000
    
    ------------------------------------------------------------------------------
           lwage | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
    TVexogenous  |
             occ |  -.0187947   .0131247    -1.43   0.152    -.0445188    .0069293
           south |   .0022604    .030845     0.07   0.942    -.0581948    .0627155
            smsa |  -.0367277   .0182136    -2.02   0.044    -.0724258   -.0010297
             ind |   .0186278    .014634     1.27   0.203    -.0100542    .0473099
           tdum1 |  -.2264236   .1289683    -1.76   0.079    -.4791967    .0263496
           tdum2 |  -.1964658   .1076052    -1.83   0.068    -.4073681    .0144364
           tdum3 |  -.1253684   .0862396    -1.45   0.146    -.2943949    .0436581
           tdum4 |  -.0847657   .0648792    -1.31   0.191    -.2119266    .0423951
           tdum5 |  -.0513602   .0437553    -1.17   0.240    -.1371191    .0343987
           tdum6 |  -.0303846   .0230229    -1.32   0.187    -.0755087    .0147395
    TVendogenous |
             exp |    .073714   .0215674     3.42   0.001     .0314426    .1159854
            exp2 |  -.0003991   .0000521    -7.67   0.000    -.0005012   -.0002971
             wks |   .0006809    .000572     1.19   0.234    -.0004402     .001802
              ms |  -.0290953   .0180598    -1.61   0.107    -.0644919    .0063012
           union |   .0299634   .0141949     2.11   0.035     .0021418     .057785
    TIexogenous  |
             fem |  -.2665069    .141694    -1.88   0.060     -.544222    .0112083
             blk |  -.2090395   .1534086    -1.36   0.173    -.5097149    .0916358
    TIendogenous |
              ed |   .1191401   .0228087     5.22   0.000     .0744358    .1638444
                 |
           _cons |   4.041501   .6874552     5.88   0.000     2.694113    5.388888
    -------------+----------------------------------------------------------------
         sigma_u |  .93271009
         sigma_e |  .15111177
             rho |  .97442293   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    
    
    */

    Regarding your second question, you should drop one category to avoid collinearity. Stata drops one category automatically and raises a note.
    Example:

    Code:
    gen byte male = .
    replace male = 1 if fem == 0
    replace male = 0 if fem == 1
    xthtaylor lwage occ south smsa ind exp exp2 wks ms union fem male blk ed tdum1 tdum2 tdum3 tdum4 tdum5 tdum6, endog(exp exp2 wks ms union ed)
    /*note: fem omitted because of collinearity.
    ...
    */


    Comment

    Working...
    X