In a fixed effects logit regression does Stata only drop observations that have no variation in the outcome variable? Or also the controls?

John Adler

Join Date: Apr 2017
Posts: 173

In a fixed effects logit regression does Stata only drop observations that have no variation in the outcome variable? Or also the controls?

28 Sep 2019, 12:55

I have a 3 wave panel of children's height and weight and parents employment or unemployment (binary) during a recession.

As you will see below, I make use of a fixed effects logit model to test whether a change from employed to unemployed resulted in weight gain in children (a binary obesity outcome). In the model I control for maternal age, education, marital status and urban vs. rural location (all except location are categorical variables). My understanding was that a fixed effects logit regression drops any variables that do not change across waves, so with controls that traditionally don't change, such as adult female education, I was sure that my initial sample of 11,000 children would plummet when I ran the analysis.

As you can see from the below output, this did not happen, instead I have 1,945 groups in my analysis of obesity across the three waves of the study. So my question is, did I misunderstand how xtlogit, fe works? Does it only drop individuals without variation in the outcome? Or do individuals need to have variation in the outcome variable (obesity here), the primary independent variable (parental unemployment here) and the control variables (region, year, mothers age, mother's education, and mother's marital status here) to be included in the regression?

Also does this mean I end up with a really weird sample? i.e. am I limited to saying, "for children whose mothers moved, changed their education and changed their marital status I find that parental unemployment increases weight". This seems like what an analysis that only includes changers in each variable would amount to, but maybe I'm confused?

Thank you for any advice,

John

Code:


. xtlogit O_obese_y  X_eitherparentunemployed_y i.C_region_y i.year i.C_Simplemotherage_y i.C_Simplemothere
> duca_y S_age_months_y i.C_mothermar_y, fe nolog
note: S_age_months_y omitted because of collinearity
note: multiple positive outcomes within groups encountered.
note: 9,053 groups (23,188 obs) dropped because of all positive or
      all negative outcomes.

Conditional fixed-effects logistic regression   Number of obs     =      5,535
Group variable: id                              Number of groups  =      1,945

                                                Obs per group:
                                                              min =          2
                                                              avg =        2.8
                                                              max =          3

                                                LR chi2(12)       =     239.12
Log likelihood  = -1895.5991                    Prob > chi2       =     0.0000

----------------------------------------------------------------------------------------------------
                         O_obese_y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------------------------------+----------------------------------------------------------------
        X_eitherparentunemployed_y |   .2705773   .1005702     2.69   0.007     .0734634    .4676913
                      1.C_region_y |  -.0209971   .1507919    -0.14   0.889    -.3165439    .2745497
                                   |
                              year |
                                1  |   .3195079   .0596055     5.36   0.000     .2026833    .4363326
                                2  |  -.5625906   .0751008    -7.49   0.000    -.7097854   -.4153958
                                   |
               C_Simplemotherage_y |
                            19-29  |   .1622443   .1849878     0.88   0.380    -.2003252    .5248138
                            30-39  |   .1249393   .1217915     1.03   0.305    -.1137676    .3636462
                                   |
             C_Simplemothereduca_y |
Leaving Certificate to Non Degree  |   .4089109   .2192694     1.86   0.062    -.0208492    .8386711
        Primary Degree or greater  |   .4875182   .2805553     1.74   0.082    -.0623601    1.037397
                                   |
                    S_age_months_y |          0  (omitted)
                                   |
                     C_mothermar_y |
                                2  |  -.0491349   .2948536    -0.17   0.868    -.6270372    .5287675
                                3  |  -.3627668   .4261465    -0.85   0.395    -1.197999    .4724649
                                4  |  -.1200269   .1744402    -0.69   0.491    -.4619234    .2218696
                                5  |    .762811   1.064853     0.72   0.474    -1.324263    2.849884
----------------------------------------------------------------------------------------------------

. margins, dydx(X_eitherparentunemployed_y) post

Average marginal effects                        Number of obs     =      5,535
Model VCE    : OIM

Expression   : Pr(O_obese_y|fixed effect is 0), predict(pu0)
dy/dx w.r.t. : X_eitherparentunemployed_y

--------------------------------------------------------------------------------------------
                           |            Delta-method
                           |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------------------+----------------------------------------------------------------
X_eitherparentunemployed_y |   .0622649   .0231713     2.69   0.007       .01685    .1076798
--------------------------------------------------------------------------------------------

Tags: fixed effects, logit, panel data, regression, syntax

Clyde Schechter

Join Date: Apr 2014

Posts: 30102
#2

28 Sep 2019, 13:27

Does it only drop individuals without variation in the outcome?

Not exactly, but for your purposes, this is correct. Variation in the predictor variables is not needed to be retained in the estimation sample, only variation in the outcome. (There are other reasons that individuals or observations within individuals may be dropped, but they don't have to do with variation, and there is nothing in your output to suggest that any of these other things have happened in your analysis.)
Comment
John Adler

Join Date: Apr 2017

Posts: 173
#3

29 Sep 2019, 05:16

Dear Clyde,

Thank you as always for your astute response,

So, would I be right in thinking that in a fixed effects logit analysis only individuals (or observations) without variation in the outcome will be dropped? Regardless of whether their controls vary? Say I have a marital control and someone stays married for the duration of the study, provided the outcome changes will they remain in the xtlogit fixed effects analysis?
Could you recommend a source for me to learn about this further? I would be interested in learning the other reasons that individuals are dropped an the analysis that you mentioned above.
Some colleagues have suggested that my analysis is a subset of only those who experience a change in every predictor and the outcome and I would like to be able to cite some evidence to explain why this isn't the case!

Very best,

John

Last edited by John Adler; 29 Sep 2019, 05:28.
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30102

29 Sep 2019, 12:12

So, would I be right in thinking that in a fixed effects logit analysis only individuals (or observations) without variation in the outcome will be dropped? Regardless of whether their controls vary? Say I have a marital control and someone stays married for the duration of the study, provided the outcome changes will they remain in the xtlogit fixed effects analysis?

That's right.

Could you recommend a source for me to learn about this further? I would be interested in learning the other reasons that individuals are dropped an the analysis that you mentioned above.
Some colleagues have suggested that my analysis is a subset of only those who experience a change in every predictor and the outcome and I would like to be able to cite some evidence to explain why this isn't the case!

Your colleagues are wrong. I don't know of any references to cite about this. It's not the kind of thing one would find in a journal. But you can easily convince them with a simple demonstration. Show them this:

Code:

. webuse grunfeld, clear

.
. xtset
       panel variable:  company (strongly balanced)
        time variable:  year, 1935 to 1954
                delta:  1 year

.
. //      MAKE kstock NOT VARYING IN COMPANIES 1 THROUGH 5
. by company (year), sort: replace kstock = kstock[1] if inrange(company, 1, 5)
(95 real changes made)

.
.
. //      FIXED EFFECTS REGRESSION OF mvalue ON kstock
. xtreg mvalue kstock, fe

Fixed-effects (within) regression               Number of obs     =        200
Group variable: company                         Number of groups  =         10

R-sq:                                           Obs per group:
     within  = 0.0206                                         min =         20
     between = 0.2191                                         avg =       20.0
     overall = 0.1480                                         max =         20

                                                F(1,189)          =       3.97
corr(u_i, Xb)  = -0.4649                        Prob > F          =     0.0478

------------------------------------------------------------------------------
      mvalue |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      kstock |   .8668212   .4350863     1.99   0.048     .0085721     1.72507
       _cons |   981.4062   55.95716    17.54   0.000     871.0254    1091.787
-------------+----------------------------------------------------------------
     sigma_u |  1384.2682
     sigma_e |  345.82254
         rho |  .94125468   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(9, 189) = 251.18                    Prob > F = 0.0000

.
. //      DEMONSTRATE THAT ALL COMPANIES ARE IN THE ESTIMATION SAMPLE
. tab company if e(sample)

    company |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |         20       10.00       10.00
          2 |         20       10.00       20.00
          3 |         20       10.00       30.00
          4 |         20       10.00       40.00
          5 |         20       10.00       50.00
          6 |         20       10.00       60.00
          7 |         20       10.00       70.00
          8 |         20       10.00       80.00
          9 |         20       10.00       90.00
         10 |         20       10.00      100.00
------------+-----------------------------------
      Total |        200      100.00

Comment

John Adler

Join Date: Apr 2017

Posts: 173
#5

30 Sep 2019, 05:15

Dear Clyde,

Thank you for supplying this easy to replicate answer! My takeaway from our discussion is that in my xtlogit, fe example above, which controls for education, where mothers do not experience a change in education they still contribute to an analysis of the effect of changing parental employment on child weight, but in applied sense I guess I don't understand how they are contributing to the analysis if their controls are not changing? What information are they actually adding?

Apologies for the simple questions, I have been reading Fixed Effects Regression Models by Paul D. Allison but I'm struggling to understand this in an applied sense.

Best regards,

John

Last edited by John Adler; 30 Sep 2019, 05:49. Reason: clarified I was running an xtlogit, fe model
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#6

30 Sep 2019, 05:34

John:
as an aside to Clyde's enlightening example, please note that when we speak about fixed effects in panel data logistic regression we implicitly refer to conditional fixed effects (a different beast to tame vs fixed effect we are used to when it comes to -xtreg, fe-).
See, if interested, http://methods.johndavidpoe.com/2016...rameters-bias/ and related references.

Kind regards,
Carlo
(Stata 19.0)
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30102

30 Sep 2019, 12:50

Well, if you run the following example:

Code:

. webuse grunfeld, clear

. 
. by company (year), sort: replace kstock = kstock[1] if inrange(company, 1, 5)
(95 real changes made)

. 
. xtreg mvalue kstock, fe

Fixed-effects (within) regression               Number of obs     =        200
Group variable: company                         Number of groups  =         10

R-sq:                                           Obs per group:
     within  = 0.0206                                         min =         20
     between = 0.2191                                         avg =       20.0
     overall = 0.1480                                         max =         20

                                                F(1,189)          =       3.97
corr(u_i, Xb)  = -0.4649                        Prob > F          =     0.0478

------------------------------------------------------------------------------
      mvalue |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      kstock |   .8668212   .4350863     1.99   0.048     .0085721     1.72507
       _cons |   981.4062   55.95716    17.54   0.000     871.0254    1091.787
-------------+----------------------------------------------------------------
     sigma_u |  1384.2682
     sigma_e |  345.82254
         rho |  .94125468   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(9, 189) = 251.18                    Prob > F = 0.0000

. 
. xtreg mvalue kstock if !inrange(company, 1, 5), fe

Fixed-effects (within) regression               Number of obs     =        100
Group variable: company                         Number of groups  =          5

R-sq:                                           Obs per group:
     within  = 0.2409                                         min =         20
     between = 0.0189                                         avg =       20.0
     overall = 0.0026                                         max =         20

                                                F(1,94)           =      29.84
corr(u_i, Xb)  = -0.4558                        Prob > F          =     0.0000

------------------------------------------------------------------------------
      mvalue |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      kstock |   .8668212   .1586861     5.46   0.000     .5517462    1.181896
       _cons |   188.8255   28.59796     6.60   0.000     132.0436    245.6075
-------------+----------------------------------------------------------------
     sigma_u |  279.05698
     sigma_e |  126.12953
         rho |  .83036456   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(4, 94) = 77.56                      Prob > F = 0.0000

you will see that those observations do not alter the estimate of the coefficient of the unchanging variable, just as your intuition tells you it should be. But they do alter other aspects of the results. The standard errors are different, as are the estimates of sigma_u, sigma_e, and rho.

Announcement

In a fixed effects logit regression does Stata only drop observations that have no variation in the outcome variable? Or also the controls?

Comment

Comment

Comment

Comment

Comment

Comment