Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • In a fixed effects logit regression does Stata only drop observations that have no variation in the outcome variable? Or also the controls?

    I have a 3 wave panel of children's height and weight and parents employment or unemployment (binary) during a recession.

    As you will see below, I make use of a fixed effects logit model to test whether a change from employed to unemployed resulted in weight gain in children (a binary obesity outcome). In the model I control for maternal age, education, marital status and urban vs. rural location (all except location are categorical variables). My understanding was that a fixed effects logit regression drops any variables that do not change across waves, so with controls that traditionally don't change, such as adult female education, I was sure that my initial sample of 11,000 children would plummet when I ran the analysis.

    As you can see from the below output, this did not happen, instead I have 1,945 groups in my analysis of obesity across the three waves of the study. So my question is, did I misunderstand how xtlogit, fe works? Does it only drop individuals without variation in the outcome? Or do individuals need to have variation in the outcome variable (obesity here), the primary independent variable (parental unemployment here) and the control variables (region, year, mothers age, mother's education, and mother's marital status here) to be included in the regression?

    Also does this mean I end up with a really weird sample? i.e. am I limited to saying, "for children whose mothers moved, changed their education and changed their marital status I find that parental unemployment increases weight". This seems like what an analysis that only includes changers in each variable would amount to, but maybe I'm confused?

    Thank you for any advice,

    John


    Code:
    
    . xtlogit O_obese_y  X_eitherparentunemployed_y i.C_region_y i.year i.C_Simplemotherage_y i.C_Simplemothere
    > duca_y S_age_months_y i.C_mothermar_y, fe nolog
    note: S_age_months_y omitted because of collinearity
    note: multiple positive outcomes within groups encountered.
    note: 9,053 groups (23,188 obs) dropped because of all positive or
          all negative outcomes.
    
    Conditional fixed-effects logistic regression   Number of obs     =      5,535
    Group variable: id                              Number of groups  =      1,945
    
                                                    Obs per group:
                                                                  min =          2
                                                                  avg =        2.8
                                                                  max =          3
    
                                                    LR chi2(12)       =     239.12
    Log likelihood  = -1895.5991                    Prob > chi2       =     0.0000
    
    ----------------------------------------------------------------------------------------------------
                             O_obese_y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -----------------------------------+----------------------------------------------------------------
            X_eitherparentunemployed_y |   .2705773   .1005702     2.69   0.007     .0734634    .4676913
                          1.C_region_y |  -.0209971   .1507919    -0.14   0.889    -.3165439    .2745497
                                       |
                                  year |
                                    1  |   .3195079   .0596055     5.36   0.000     .2026833    .4363326
                                    2  |  -.5625906   .0751008    -7.49   0.000    -.7097854   -.4153958
                                       |
                   C_Simplemotherage_y |
                                19-29  |   .1622443   .1849878     0.88   0.380    -.2003252    .5248138
                                30-39  |   .1249393   .1217915     1.03   0.305    -.1137676    .3636462
                                       |
                 C_Simplemothereduca_y |
    Leaving Certificate to Non Degree  |   .4089109   .2192694     1.86   0.062    -.0208492    .8386711
            Primary Degree or greater  |   .4875182   .2805553     1.74   0.082    -.0623601    1.037397
                                       |
                        S_age_months_y |          0  (omitted)
                                       |
                         C_mothermar_y |
                                    2  |  -.0491349   .2948536    -0.17   0.868    -.6270372    .5287675
                                    3  |  -.3627668   .4261465    -0.85   0.395    -1.197999    .4724649
                                    4  |  -.1200269   .1744402    -0.69   0.491    -.4619234    .2218696
                                    5  |    .762811   1.064853     0.72   0.474    -1.324263    2.849884
    ----------------------------------------------------------------------------------------------------
    
    . margins, dydx(X_eitherparentunemployed_y) post
    
    Average marginal effects                        Number of obs     =      5,535
    Model VCE    : OIM
    
    Expression   : Pr(O_obese_y|fixed effect is 0), predict(pu0)
    dy/dx w.r.t. : X_eitherparentunemployed_y
    
    --------------------------------------------------------------------------------------------
                               |            Delta-method
                               |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ---------------------------+----------------------------------------------------------------
    X_eitherparentunemployed_y |   .0622649   .0231713     2.69   0.007       .01685    .1076798
    --------------------------------------------------------------------------------------------

  • #2
    Does it only drop individuals without variation in the outcome?
    Not exactly, but for your purposes, this is correct. Variation in the predictor variables is not needed to be retained in the estimation sample, only variation in the outcome. (There are other reasons that individuals or observations within individuals may be dropped, but they don't have to do with variation, and there is nothing in your output to suggest that any of these other things have happened in your analysis.)

    Comment


    • #3
      Dear Clyde,

      Thank you as always for your astute response,

      So, would I be right in thinking that in a fixed effects logit analysis only individuals (or observations) without variation in the outcome will be dropped? Regardless of whether their controls vary? Say I have a marital control and someone stays married for the duration of the study, provided the outcome changes will they remain in the xtlogit fixed effects analysis?
      Could you recommend a source for me to learn about this further? I would be interested in learning the other reasons that individuals are dropped an the analysis that you mentioned above.
      Some colleagues have suggested that my analysis is a subset of only those who experience a change in every predictor and the outcome and I would like to be able to cite some evidence to explain why this isn't the case!

      Very best,

      John
      Last edited by John Adler; 29 Sep 2019, 05:28.

      Comment


      • #4
        So, would I be right in thinking that in a fixed effects logit analysis only individuals (or observations) without variation in the outcome will be dropped? Regardless of whether their controls vary? Say I have a marital control and someone stays married for the duration of the study, provided the outcome changes will they remain in the xtlogit fixed effects analysis?
        That's right.

        Could you recommend a source for me to learn about this further? I would be interested in learning the other reasons that individuals are dropped an the analysis that you mentioned above.
        Some colleagues have suggested that my analysis is a subset of only those who experience a change in every predictor and the outcome and I would like to be able to cite some evidence to explain why this isn't the case!
        Your colleagues are wrong. I don't know of any references to cite about this. It's not the kind of thing one would find in a journal. But you can easily convince them with a simple demonstration. Show them this:

        Code:
        . webuse grunfeld, clear
        
        .
        . xtset
               panel variable:  company (strongly balanced)
                time variable:  year, 1935 to 1954
                        delta:  1 year
        
        .
        . //      MAKE kstock NOT VARYING IN COMPANIES 1 THROUGH 5
        . by company (year), sort: replace kstock = kstock[1] if inrange(company, 1, 5)
        (95 real changes made)
        
        .
        .
        . //      FIXED EFFECTS REGRESSION OF mvalue ON kstock
        . xtreg mvalue kstock, fe
        
        Fixed-effects (within) regression               Number of obs     =        200
        Group variable: company                         Number of groups  =         10
        
        R-sq:                                           Obs per group:
             within  = 0.0206                                         min =         20
             between = 0.2191                                         avg =       20.0
             overall = 0.1480                                         max =         20
        
                                                        F(1,189)          =       3.97
        corr(u_i, Xb)  = -0.4649                        Prob > F          =     0.0478
        
        ------------------------------------------------------------------------------
              mvalue |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
              kstock |   .8668212   .4350863     1.99   0.048     .0085721     1.72507
               _cons |   981.4062   55.95716    17.54   0.000     871.0254    1091.787
        -------------+----------------------------------------------------------------
             sigma_u |  1384.2682
             sigma_e |  345.82254
                 rho |  .94125468   (fraction of variance due to u_i)
        ------------------------------------------------------------------------------
        F test that all u_i=0: F(9, 189) = 251.18                    Prob > F = 0.0000
        
        .
        . //      DEMONSTRATE THAT ALL COMPANIES ARE IN THE ESTIMATION SAMPLE
        . tab company if e(sample)
        
            company |      Freq.     Percent        Cum.
        ------------+-----------------------------------
                  1 |         20       10.00       10.00
                  2 |         20       10.00       20.00
                  3 |         20       10.00       30.00
                  4 |         20       10.00       40.00
                  5 |         20       10.00       50.00
                  6 |         20       10.00       60.00
                  7 |         20       10.00       70.00
                  8 |         20       10.00       80.00
                  9 |         20       10.00       90.00
                 10 |         20       10.00      100.00
        ------------+-----------------------------------
              Total |        200      100.00

        Comment


        • #5
          Dear Clyde,

          Thank you for supplying this easy to replicate answer! My takeaway from our discussion is that in my xtlogit, fe example above, which controls for education, where mothers do not experience a change in education they still contribute to an analysis of the effect of changing parental employment on child weight, but in applied sense I guess I don't understand how they are contributing to the analysis if their controls are not changing? What information are they actually adding?

          Apologies for the simple questions, I have been reading Fixed Effects Regression Models by Paul D. Allison but I'm struggling to understand this in an applied sense.

          Best regards,

          John
          Last edited by John Adler; 30 Sep 2019, 05:49. Reason: clarified I was running an xtlogit, fe model

          Comment


          • #6
            John:
            as an aside to Clyde's enlightening example, please note that when we speak about fixed effects in panel data logistic regression we implicitly refer to conditional fixed effects (a different beast to tame vs fixed effect we are used to when it comes to -xtreg, fe-).
            See, if interested, http://methods.johndavidpoe.com/2016...rameters-bias/ and related references.
            Kind regards,
            Carlo
            (Stata 18.0 SE)

            Comment


            • #7
              Well, if you run the following example:
              Code:
              . webuse grunfeld, clear
              
              . 
              . by company (year), sort: replace kstock = kstock[1] if inrange(company, 1, 5)
              (95 real changes made)
              
              . 
              . xtreg mvalue kstock, fe
              
              Fixed-effects (within) regression               Number of obs     =        200
              Group variable: company                         Number of groups  =         10
              
              R-sq:                                           Obs per group:
                   within  = 0.0206                                         min =         20
                   between = 0.2191                                         avg =       20.0
                   overall = 0.1480                                         max =         20
              
                                                              F(1,189)          =       3.97
              corr(u_i, Xb)  = -0.4649                        Prob > F          =     0.0478
              
              ------------------------------------------------------------------------------
                    mvalue |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
              -------------+----------------------------------------------------------------
                    kstock |   .8668212   .4350863     1.99   0.048     .0085721     1.72507
                     _cons |   981.4062   55.95716    17.54   0.000     871.0254    1091.787
              -------------+----------------------------------------------------------------
                   sigma_u |  1384.2682
                   sigma_e |  345.82254
                       rho |  .94125468   (fraction of variance due to u_i)
              ------------------------------------------------------------------------------
              F test that all u_i=0: F(9, 189) = 251.18                    Prob > F = 0.0000
              
              . 
              . xtreg mvalue kstock if !inrange(company, 1, 5), fe
              
              Fixed-effects (within) regression               Number of obs     =        100
              Group variable: company                         Number of groups  =          5
              
              R-sq:                                           Obs per group:
                   within  = 0.2409                                         min =         20
                   between = 0.0189                                         avg =       20.0
                   overall = 0.0026                                         max =         20
              
                                                              F(1,94)           =      29.84
              corr(u_i, Xb)  = -0.4558                        Prob > F          =     0.0000
              
              ------------------------------------------------------------------------------
                    mvalue |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
              -------------+----------------------------------------------------------------
                    kstock |   .8668212   .1586861     5.46   0.000     .5517462    1.181896
                     _cons |   188.8255   28.59796     6.60   0.000     132.0436    245.6075
              -------------+----------------------------------------------------------------
                   sigma_u |  279.05698
                   sigma_e |  126.12953
                       rho |  .83036456   (fraction of variance due to u_i)
              ------------------------------------------------------------------------------
              F test that all u_i=0: F(4, 94) = 77.56                      Prob > F = 0.0000
              you will see that those observations do not alter the estimate of the coefficient of the unchanging variable, just as your intuition tells you it should be. But they do alter other aspects of the results. The standard errors are different, as are the estimates of sigma_u, sigma_e, and rho.

              Comment

              Working...
              X