Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • -margins- after -xtlogit,fe-

    Dear Statalist

    This is a question about interpreting the results from a panel data fixed-effects logistic regression. The outcome variable is binary & the main regressor is categorical with 4 levels.

    As the estimated odds ratios change depending on which base level is selected, in a cross-sectional setting I prefer to use -margins- and interpret the results in terms of average adjusted predictions (which is unaffected by the base level). However, when using -xtlogit-, the average adjusted predictions appear to change depending on the base level.

    Question: is this the expected behaviour for -margins- after -xtlogit-? If so, would it be preferable to interpret the results in terms of odds ratio instead of probabilities in a panel-data setting?

    Code:
    use http://www.stata-press.com/data/r16/union.dta, clear
    
    xtset idcode year, yearly
    
    * Discretize the -grade- variable into 4 levels for illustration purpose
    egen grade_category = cut(grade), at(0,7,13,16,19) icodes
    label define grade_category 0 "primary" 1 "secondary" 2 "undergraduate" 3 "postgraduate"
    label values grade_category grade_category
    If we treat the data as cross-sectional, the results from -margins- are unchanged by the base level of the regressor.

    Code:
    quietly logit union i.year ib(0).grade_category
    
    margins grade_category
    
    Predictive margins                              Number of obs     =     26,200
    Model VCE    : OIM
    
    Expression   : Pr(union), predict()
    
    --------------------------------------------------------------------------------
                   |            Delta-method
                   |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ---------------+----------------------------------------------------------------
    grade_category |
          primary  |   .2349991   .0276247     8.51   0.000     .1808556    .2891425
        secondary  |   .2073589   .0031732    65.35   0.000     .2011395    .2135782
    undergraduate  |   .1943004   .0058311    33.32   0.000     .1828718    .2057291
     postgraduate  |   .2937748   .0064781    45.35   0.000      .281078    .3064717
    --------------------------------------------------------------------------------
    
    quietly logit union i.year ib(1).grade_category
    margins grade_category
    *(output omitted)
    
    quietly logit union i.year ib(2).grade_category
    margins grade_category
    *(output omitted)
    
    quietly logit union i.year ib(3).grade_category
    margins grade_category
    *(output omitted)
    This is not the case, however, with panel-data -xtlogit-
    Code:
    . quietly xtlogit union i.year ib(0).grade_category, fe
    
    . margins grade_category
    
    Predictive margins                              Number of obs     =     12,035
    Model VCE    : OIM
    
    Expression   : Pr(union|fixed effect is 0), predict(pu0)
    
    --------------------------------------------------------------------------------
                   |            Delta-method
                   |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ---------------+----------------------------------------------------------------
    grade_category |
          primary  |   .5184114   .0215869    24.02   0.000     .4761018    .5607209
        secondary  |   .5703154   .2774507     2.06   0.040      .026522    1.114109
    undergraduate  |   .5507514   .2823345     1.95   0.051    -.0026142    1.104117
     postgraduate  |   .6687735   .2569906     2.60   0.009     .1650813    1.172466
    --------------------------------------------------------------------------------
    
    . quietly xtlogit union i.year ib(1).grade_category, fe
    
    . margins grade_category
    
    Predictive margins                              Number of obs     =     12,035
    Model VCE    : OIM
    
    Expression   : Pr(union|fixed effect is 0), predict(pu0)
    
    --------------------------------------------------------------------------------
                   |            Delta-method
                   |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ---------------+----------------------------------------------------------------
    grade_category |
          primary  |   .4661257   .2823028     1.65   0.099    -.0871777    1.019429
        secondary  |   .5184114   .0215869    24.02   0.000     .4761018    .5607209
    undergraduate  |   .4985708   .0396701    12.57   0.000     .4208188    .5763228
     postgraduate  |   .6207837   .0584854    10.61   0.000     .5061544     .735413
    --------------------------------------------------------------------------------
    
    *and so on
    Thanks,
    Junran

  • #2
    fixed effects logit has some special issues with prediction (see the clogit documentation). I don't know if this is what is generating your odd results or not.

    Comment


    • #3
      Thanks Phil. Hazarding a guess "pu0 - probability of a positive outcome, assuming fixed effect is zero; the default" (p.270 from -clogit postestimation-) might be the issue here. Assuming fixed effect is zero would appear to defeat the use of FE in the first place.

      The documentation doesn't elaborate on this point, however. So if anyone knows a good resource on this topic, that would be much appreciated.

      Comment


      • #4
        With conditional logic estimation, as in -clogit-, only the slope coefficients are identified; the fixed effects are not. So, one cannot derive predicted probabilities in general.

        Comment


        • #5
          Thank you Stephen. In this case, I will interpret my results in terms of odds ratio.

          Comment


          • #6
            I would like to re-open this thread. I think there is something wrong.

            While it is true that you cannot get predicted probabilities from a conditional logistic regression, you can get probabilities conditional on u = 0. And I don't see why those should differ depending on the choice of reference category for one of the predictor variables. We can debate whether pu0's are of any real use--I suppose that depends on the context. But I believe they are identifiable in the model and -margins- should be providing the same results (at least up to trivial numerical errors) for what are, when all is said and done, just different parameterizations of the same model.

            Note that with -xtlogit, re-, -margins, predict(pu0)- gives the same results regardless of the reference category.

            I think the problem here is that although no constant term is reported by -xtlogit, fe-, there is an "implicit" constant term that is missing from the calculations of -xb- and -pu0-, because just as -margins- gives results that vary with the reference category for -pu0-, so do -predict, pu0- and -predict, xb-. In other types of regressions, when you change the reference category, the coefficients and the constant term all change, but they do so in such a way that the values of -predict, xb- are the same regardless. But that is not happening with xtlogit-fe..

            The very need to have a reference category is, itself, indicative of a covert constant term that is not being reported out. When we represent a categorical variable by a series of level-indicator ("dummy") variables, those indicators are not colinear by themselves: they are colinear only in conjunction with a constant term. Yes, the sum of the indicator variables is always 1, but that does not make for a collinearity: a collinearity is a linear combination with non-zero coefficients that sums to zero, and without the constant term thrown into the mix, you do not get that from the level indicator variables alone. Indeed, in other forms of regression you can specify that you want no omitted category using the ibn. prefix if you also specify a -noconstant- option. And when you do that, you get the same results for -predict, xb- or from -margins-. But with -xtlogit, fe- this approach fails because the model never converges: the likelihood is not concave. So even when you specify ibn. and -noconstant- (not documented, but legal), or -colinear- with -xtlogit, fe- the model fails to converge, again suggesting that there is an implicit constant term in the model leading to a collinearity.

            Without that constant term, of course, the predicted values for xb (and hence also for pu0 which is just invlogit(xb)) will be dependent on the choice of the reference category. The question is, what is the status of this implicit constant term? It has been decades since I originally studied the maximum likelihood estimation of the conditional logistic regression model in detail, and I don't remember it well enough to comment further. But either there isn't supposed to be an implicit constant term, or, if there is, it needs to be included in the calculations of xb and pu0. If there really isn't supposed to be one, then it seems to me that pu0 is inherently unidentifiable and -predict- and -margins- should not compute it at all.

            If somebody from StataCorp is following this thread, I hope they will come to the rescue here. I'm eager to know what's going on here.


            Comment


            • #7
              Many thanks Clyde. Your post is very illuminating and I understand your conjecture that - possibly - there is an implicit constant term that has not been incorporated into the -margins- calculations. Yes hopefully someone from StataCorp will comment on this.

              Comment


              • #8
                Dear All,

                This may help:

                Gordon Kemp & João Santos Silva, 2016. "Partial effects in fixed-effects models," United Kingdom Stata Users' Group Meetings 2016 06, Stata Users Group.

                In short, do not use margins after xtlogit with fe.

                Best wishes,

                Joao

                Comment


                • #9
                  Very enlightening, thanks Joao Santos Silva !

                  Comment


                  • #10
                    Thank you very much Joao. The equations on slides 3 and 5, in particular, help to answer this question. I will be sure to cite your presentation in my paper.

                    Comment

                    Working...
                    X