Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Coefficient changes sign with xtlogit

    Hi,

    We have an unbalanced panel for which we want to run logit regressions (Stata 17). We use register data and can unfortunately not share any data, and haven't been able to see a similar problem in any of the Stata datasets, so we sill try to describe the problem. The panel is basically cross-sections of households between ages 18-35 between 2012-2020, meaning that households disappear from and enter the datasets over the years because of the ages.

    We have a model (the real model contains other variables, but the problem is with this single variable) in which we regress mobility (dichotomous) on housing tenure (dichotomous; 1 for rental housing). It is well established from previous studies that our coefficient should be positive, and our results seem to confirm that when we run logit by year, or as a pooled sample, using the following commands:

    Code:
    logit FlyttLok_D rentalDummy
    Code:
    bysort year: logit FlyttLok_D rentalDummy
    ,
    the sign is the expected. We have also double-checked with
    Code:
    tabstat FlyttLok_D, statistics(mean)  by(rentalDummy)
    which confirms that mobility rates are higher under rental tenure. So, the dataset quite clearly indicates a positive relationship.

    But when we move into panel estimations we have problems. We did Hausmann tests which indicate we should use FE. We use:
    Code:
    xtset ID year
    xtlogit FlyttLok_D rentalDummy, fe
    and get the messages:
    note: multiple positive outcomes within groups encountered.
    note: 546,209 groups (1,291,930 obs) omitted because of all positive or
    all negative outcomes.
    This means we lose around half of the observations. Also, with FE we consistently get the 'wrong' sign (and we have tried a lot of specifications of multiple regressions). No other coefficients seem to be affected when running multiple regressions. RE gives us the expected sign. Could someone explain why this is?

    Thanks in advance for comments!

  • #2
    Try to see whether you can fit a manual fixed effects logit, like

    Code:
     
     logit FlyttLok_D rentalDummy i.ID

    Comment


    • #3
      The coefficients of rentalDummy in -logit- and -xtlogit- mean different things. There is no reason to expect them to be equal, or even have the same sign. -xtlogit-, like all fixed effects estimators, estimates the within-panel effect of rentalDummy on FlyttLok_D. By contrast, -logit- assumes that the within- and between-panel effects are the same and estimates a common value. When the within and between panel effects actually differ, this will result in noticeable differences in the resulting coefficients. The same is true for linear models, or other non-linear models such as Poisson, mlogit, etc.

      Regardless of what Hausman (or any other test) says, if you want a between-panel effect estimate you cannot get it from a fixed effects model. And if you want a within-panel effect estimate you must get it from a fixed-effects model (or an emulated fixed effects model as in #2, or certain other ways of doing it.)

      To visualize this, run the following demonstration code:
      Code:
      clear
      set obs 5
      gen panel_id = _n
      expand 2
      
      set seed 1234
      by panel_id , sort: gen y = 4*panel_id - _n + 3 + rnormal(0, 0.5)
      by panel_id: gen x = panel_id + _n
      
      xtset panel_id
      
      xtreg y x, fe
      regress y x
      
      //    GRAPH THE DATA TO SHOW WHAT'S HAPPENING
      separate y, by(panel_id)
      
      graph twoway connect y? x || lfit y x

      Comment

      Working...
      X