Logit model vs linear probability model in panel data differences in significance

John Adler

Join Date: Apr 2017
Posts: 173

Logit model vs linear probability model in panel data differences in significance

12 Mar 2018, 10:53

I have a panel of mothers over several time points, I am looking at year five and year ten in my regression below to determine if increases in obesity are due to increases in unemployment.

In my analysis I employ a linear probability model (lpm) in a random effects model to look at the relationship between changes in local area unemployment (psum_unemployed_total_cont_y) and obesity (binbmi_obese).

I also estimate a logit model (with margins) to confirm that the linear probability model is acceptable.

Something that I have noticed is that, although my coefficients are often similar in magnitude between models, the logit model is often less significant than the lpm, i.e. usually the z-stat is larger than 1.65 but just below 1.96.

This is a characteristic across almost all of my models, but I include an example below to illustrate my point.

My regression models are as below:

1. The logit model:

Code:

. * Obese:
. 
. * Logit:
. 
. xtlogit binbmi_obese_y psum_unemployed_total_cont_y i.own_education_y i.maritalstatus_y i.medical_card_y i.employment_y i.ord_age_y if gender==0, re nolog

note: 2.own_education_y != 0 predicts failure perfectly
      2.own_education_y dropped and 1 obs not used

note: 5.employment_y != 0 predicts failure perfectly
      5.employment_y dropped and 3 obs not used

note: 6.own_education_y omitted because of collinearity

Random-effects logistic regression              Number of obs     =        644
Group variable: id                              Number of groups  =        468

Random effects u_i ~ Gaussian                   Obs per group:
                                                              min =          1
                                                              avg =        1.4
                                                              max =          2

Integration method: mvaghermite                 Integration pts.  =         12

                                                Wald chi2(18)     =      38.79
Log likelihood  = -244.39807                    Prob > chi2       =     0.0030

-----------------------------------------------------------------------------------------------------------------------------
                                             binbmi_obese_y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
------------------------------------------------------------+----------------------------------------------------------------
                               psum_unemployed_total_cont_y |   .2192371   .0629466     3.48   0.000     .0958641    .3426101
                                                            |
                                            own_education_y |
                                  Primary school education  |          0  (empty)
                                     Some secondary school  |   2.663921   1.342674     1.98   0.047     .0323287    5.295514
                              Complete secondary education  |   1.275665   .9904268     1.29   0.198    -.6655358    3.216866
    Some third level education at college, university, RTC  |   2.306741   1.111069     2.08   0.038     .1290862    4.484396
Complete third level education at college, university, RTC  |          0  (omitted)
                                                            |
                                            maritalstatus_y |
                                                Cohabiting  |  -2.320489   1.860596    -1.25   0.212    -5.967191    1.326213
                                                 Separated  |   .6197481   3.457853     0.18   0.858    -6.157519    7.397015
                                                  Divorced  |   2.169449   4.667265     0.46   0.642    -6.978222    11.31712
                                                   Widowed  |  -.1380215   5.929293    -0.02   0.981    -11.75922    11.48318
                                      Single/Never married  |  -3.850925   2.827318    -1.36   0.173    -9.392367    1.690516
                                                            |
                                             medical_card_y |
                                                       Yes  |  -.0629122   .9992572    -0.06   0.950     -2.02142    1.895596
                                                            |
                                               employment_y |
                                                Unemployed  |   .7574163   2.422145     0.31   0.755      -3.9899    5.504732
  Unable to work owing to permanent sickness or disability  |     15.016   3.609077     4.16   0.000     7.942335    22.08966
                                         At school/student  |  -8.785242   3.742243    -2.35   0.019     -16.1199    -1.45058
                           Seeking work for the first time  |          0  (empty)
                                                  Employed  |  -.3359071   .8523104    -0.39   0.693    -2.006405    1.334591
                                             Self Employed  |   .4553743   1.412773     0.32   0.747    -2.313609    3.224358
                                                            |
                                                  ord_age_y |
                                                     24-27  |  -1.491118   3.713545    -0.40   0.688    -8.769533    5.787297
                                                     28-32  |  -4.069408   3.860242    -1.05   0.292    -11.63534    3.496527
                                                      33 +  |  -5.115289   3.929976    -1.30   0.193     -12.8179    2.587322
                                                            |
                                                      _cons |  -8.224994   3.972662    -2.07   0.038    -16.01127   -.4387195
------------------------------------------------------------+----------------------------------------------------------------
                                                   /lnsig2u |   4.624441    .152662                      4.325229    4.923653
------------------------------------------------------------+----------------------------------------------------------------
                                                    sigma_u |   10.09682   .7707005                      8.693839    11.72621
                                                        rho |   .9687381   .0046233                      .9582889    .9766335
-----------------------------------------------------------------------------------------------------------------------------
LR test of rho=0: chibar2(01) = 50.28                  Prob >= chibar2 = 0.000

. margins if gender==0, dydx(psum_unemployed_total_cont_y) post

Average marginal effects                        Number of obs     =        644
Model VCE    : OIM

Expression   : Pr(binbmi_obese_y=1), predict(pr)
dy/dx w.r.t. : psum_unemployed_total_cont_y

----------------------------------------------------------------------------------------------
                             |            Delta-method
                             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------------------------+----------------------------------------------------------------
psum_unemployed_total_cont_y |    .002648   .0013722     1.93   0.054    -.0000414    .0053374
----------------------------------------------------------------------------------------------

2. The LPM Model:

Code:

. * LPM:
. 
. xtreg binbmi_obese_y psum_unemployed_total_cont_y i.own_education_y i.maritalstatus_y i.medical_card_y i.employment_y i.ord_age_y if gender==0, cluster (current_count
> y_y1) re robust

Random-effects GLS regression                   Number of obs     =        648
Group variable: id                              Number of groups  =        470

R-sq:                                           Obs per group:
     within  = 0.1308                                         min =          1
     between = 0.0278                                         avg =        1.4
     overall = 0.0408                                         max =          2

                                                Wald chi2(20)     =    3161.46
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

                                                                    (Std. Err. adjusted for 25 clusters in current_county_y1)
-----------------------------------------------------------------------------------------------------------------------------
                                                            |               Robust
                                             binbmi_obese_y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
------------------------------------------------------------+----------------------------------------------------------------
                               psum_unemployed_total_cont_y |   .0056285   .0016592     3.39   0.001     .0023764    .0088805
                                                            |
                                            own_education_y |
                                     Some secondary school  |   .2865854   .0458478     6.25   0.000     .1967254    .3764454
                              Complete secondary education  |   .2297725   .0361645     6.35   0.000     .1588915    .3006536
    Some third level education at college, university, RTC  |   .2824864   .0569089     4.96   0.000      .170947    .3940258
Complete third level education at college, university, RTC  |   .1690297   .0233981     7.22   0.000     .1231702    .2148892
                                                            |
                                            maritalstatus_y |
                                                Cohabiting  |  -.0798641   .0448935    -1.78   0.075    -.1678538    .0081256
                                                 Separated  |   .0263302   .1272542     0.21   0.836    -.2230834    .2757438
                                                  Divorced  |   .0641585   .1679497     0.38   0.702    -.2650169    .3933338
                                                   Widowed  |   .0472516   .1193659     0.40   0.692    -.1867013    .2812046
                                      Single/Never married  |  -.1314298   .0588051    -2.24   0.025    -.2466857    -.016174
                                                            |
                                             medical_card_y |
                                                       Yes  |   .0011589   .0358399     0.03   0.974     -.069086    .0714037
                                                            |
                                               employment_y |
                                                Unemployed  |    .044326   .0615108     0.72   0.471    -.0762329    .1648849
  Unable to work owing to permanent sickness or disability  |   .5130424   .1505187     3.41   0.001     .2180311    .8080537
                                         At school/student  |  -.1526972   .0778524    -1.96   0.050     -.305285   -.0001094
                           Seeking work for the first time  |  -.0947593   .0895379    -1.06   0.290    -.2702503    .0807317
                                                  Employed  |  -.0113248   .0164682    -0.69   0.492    -.0436018    .0209523
                                             Self Employed  |   .0148456   .0510321     0.29   0.771    -.0851755    .1148667
                                                            |
                                                  ord_age_y |
                                                     24-27  |  -.0575039   .1597636    -0.36   0.719    -.3706348    .2556271
                                                     28-32  |  -.1455253    .153839    -0.95   0.344    -.4470441    .1559935
                                                      33 +  |  -.1774199    .147281    -1.20   0.228    -.4660853    .1112455
                                                            |
                                                      _cons |   .0686216   .1396422     0.49   0.623     -.205072    .3423153
------------------------------------------------------------+----------------------------------------------------------------
                                                    sigma_u |  .30869174
                                                    sigma_e |  .21519403
                                                        rho |  .67296061   (fraction of variance due to u_i)
-----------------------------------------------------------------------------------------------------------------------------

I assume that the differences are due to the fact that the LPM model is clustered at the local area that the individual lives in (as this is the level at which I measure their unemployment) and the fact that the LPM model uses the robust option.

Thus, in justifying the differences observed between the psum_unemployed coefficient in the two models my instinct is to say something of the order that this can be attributed to the fact that the clustered standard errors which were applied in the LPM model could not be applied to the RE logit estimator, as this estimator is inconsistent in the presence of serial correlation (and heteroskedasticity) (this is in according to the PhD these of Do Wan Kwak, qouted by @Jeff Wooldridge here:https://www.statalist.org/forums/for...tandard-errors).

I also see that there is a slight problem of perfect prediction here, I wonder if this could also be adding to these differences? As the logit model is predominantly to support my use of the LPM model, do you think this is something I should worry about and is there anything I can do to handle it?

Grateful for any thoughts.

Very best,

John

Tags: linear probability model, logit, panel data, significance, syntax

Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#2

13 Mar 2018, 10:45

You run different estimators and you get different results. This is to be expected. Some like LPM and some like Logit - I tend to logit. While logit may be inconsistent with serial correlation, the discussion you reference was strictly about fixed effects whereas you use random effects. Whether the fixed effects results are relevant to your estimation is unclear.

I'm not sure that consistency is even really defined for LPM since it the 0/1 dv violates regression assumptions so greatly. LPM also can make predictions less than 0 or greater than 1. It seems odd to me that you run a panel estimator with almost 500 groups and just over 600 observations.
Comment
John Adler

Join Date: Apr 2017

Posts: 173
#3

13 Mar 2018, 11:03

Dear Phil,

Thank you for your feedback, and interesting points.

I suppose it is realistic to expect different results in different regressions. Perhaps you could clarify your point on the number of groups and observations? To be honest, it is not something I had really noticed, or understand, are there implications that it holds for my analysis?

I have three waves of women measured five years apart, there is a lot of exogenous attrition due to the presence of gate-keepers who were needed to provide access to these women and didn't always co-operate, so women could no longer be contacted. The survey started with something like 1000 women, fell to 500 women in the second wave and 400 women in the third. I am making use of a sample of 600 women who appeared in wave 1 and then again at least in wave 2 or wave 3.

As this is panel data reshaped from wide to long I assumed that the number of obs. referred to the number of variables in the analysis (i.e. women in wave 1, 2, 3 x an observation such as overweight or obesity so: wave1obesity, wave2obesity, wave3obesity) and that the number of groups referred to the number of women, although my understanding is that women might be counted more than once as their id is now their id x 3 time points.

My understanding may be very skewed though, in my regression analysis I am reporting results and number of observations from above, and I'm concerned this may not be correct? Should I instead include number of groups?

Very best,

John
Comment
John Adler

Join Date: Apr 2017

Posts: 173
#4

14 Mar 2018, 10:20

On further thoughts the safer thing might be to just report the number of unique women, I am worried that readers may not understand why 613 women are providing up to 1,800 observations in analysis across three waves.
Comment
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#5

15 Mar 2018, 12:25

If you have 1800 observations based on 3 observations per woman and 600 women, then 1800 is the right sample size for most purposes. You simply explain the data. Each separate observation on a given person is a separate observation.

If women do not randomly fall out of the sample, there is a real sample selection problem with your analysis - look at sample selection and heckman in the subject index of the documentation.

My concern was that your xtreg only had 600 observations when you say you should have 1800. If you only have one observation for most subjects, then you mainly have a cross sectional rather than panel dataset. The first thing I'd try to figure out would be where those other 1200 observations disappeared to. You can start by simply doing su on all your variables. the misstable command might also be helpful.
Comment

Announcement

Logit model vs linear probability model in panel data differences in significance

Comment

Comment

Comment

Comment