interaction effects LPM vs logistic

karin kristensson

Join Date: Sep 2019

Posts: 13
#1

interaction effects LPM vs logistic

14 Sep 2019, 07:15

Hi!

I am trying to understand why my interaction term get different signs when using an logistic and LPM model. It is negative with the LPM which I think (when plotting the data and the marginal effects and the interaction per value) is correct. But the logistic model demonstrate a positive non-significant interaction.
I am interacting working class parents with years of education.
The lpm gives a negative interaction on -.015 And the logistic shows 1.04

When plotting marginal effects, per year of education, it demonstrates a negative influence negative at almost all values, especially low ones. has it something to do with high/low values and the interaction?

I came across this:
https://www.tau.ac.il/~yoavgn/files/...s_logistic.pdf
which says: in the probability of y is higher when x is low than when x is high.This corresponds to a negative interaction in a linear probability model. when the data a represented in terms of the log odds, Because the log of the odds is the dependent variable in the logistic model, this corresponds to a positive interaction in a logistic regression.

After Reading this I still dont understand fully what the log ods has to do with it?

Does it have to do with weaker interaction effect at higher levels? someone who can help me explain with simpler words? (pic of interaction per year of education using logistic)
Tags: None
Richard Williams

Join Date: Apr 2014

Posts: 4987
#2

14 Sep 2019, 08:47

Karin, the graphic you inserted is very hard to read. It is very blurry. I can't even tell where this OR of 1.04 is coming from. I don't know what the variables and their categories are or how the graphic corresponds to your question.

Look at the Statalist FAQ, esp pt 12, on asking questions effectively. Pay particular attention to the use of code tags.

Then repost, showing both the commands and output for the lpm and logistic analyses you ran.

You say "the logistic model demonstrate a positive non-significant interaction." Since non-significant, that means negative values fall within the confidence interval. Therefore there isn't necessarily any inconsistency between the lpm and logistic. Also you don't indicate whether the lpm negative interaction is statistically significant or not, which further makes it difficult to tell if there even is an inconsistency.

One of the arguments against the LPM is that significance tests may not be right.

Personally, I agree with Paul Allison, who basically says you can get the best of both worlds by using logistic regression and the margins command.

https://statisticalhorizons.com/in-d...f-logit-part-2

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
karin kristensson

Join Date: Sep 2019

Posts: 13
#3

15 Sep 2019, 01:52

Richard,

thank you for your response. The 1.04 is when education is continuous in the logistic model. the graphic is just to demonstrate that at the negative influence of working class-origin is lower at higher levels of education as I was thinking that this miht change my results. The LPM is significant and have robust results that’s why it came across it as strange with the difference between the models and I was trying to find an answer to that.

the marginal effects of "arbetarklassbakgrund=1" looks negative to me in the image, but at very high level it is positive (see next post)

Last edited by karin kristensson; 15 Sep 2019, 02:14.
Comment
karin kristensson

Join Date: Sep 2019

Posts: 13
#4

15 Sep 2019, 02:13

when plotted including high levels of education

Attached Files

margins2.gph (10.9 KB, 1 view)
Comment
Eric de Souza

Join Date: Mar 2014

Posts: 587
#5

15 Sep 2019, 04:19

You have not done what Richard asked you to do. Without that it is difficult to help you:

"Look at the Statalist FAQ, esp pt 12, on asking questions effectively. Pay particular attention to the use of code tags.
Then repost, showing both the commands and output for the lpm and logistic analyses you ran."
Comment

karin kristensson

Join Date: Sep 2019
Posts: 13

15 Sep 2019, 11:57

I am sorry, I will try to get i right:

the code I used was the same except for changing from regress to logistic

THE LPM:
code:
gen utbarXarb=utbildningsår*arbetarklassbakgrund
regress högretjänsteman utbarXarb arbetarklassbakgrund utbildningsår ålder ålde2 kvinna andragen arbetslivserfarenhet tvatusen, robust

Output:

. regress högretjänsteman utbarXarb arbetarklassbakgrund utbildningsår ålder kvinna andragen	arbets
> livserfarenhet arbetslivserfarenhet2 tvatusen, robust
Linear regression Number of obs = 4,735
F(9, 4725) = 86.45
Prob > F = 0.0000
R-squared = 0.1802
Root MSE = .33236

Robust
högretjänsteman Coef. Std. Err. t P>t [95% Conf. Interval]
utbarXarb -.0141404 .0035221 -4.01 0.000 -.0210454 -.0072354
arbetarklassbakgrund .1221207 .0391108 3.12 0.002 .0454454 .198796
utbildningsår .0508958 .0022951 22.18 0.000 .0463963 .0553952
ålder -.0000277 .0003029 -0.09 0.927 -.0006216 .0005662
kvinna -.056733 .0097107 -5.84 0.000 -.0757706 -.0376954
andragen -.0552507 .0214457 -2.58 0.010 -.0972942 -.0132072
arbetslivserfarenhet .0051176 .0007365 6.95 0.000 .0036736 .0065616
arbetslivserfarenhet2 -.0000406 .0000101 -4.02 0.000 -.0000603 -.0000208
tvatusen .0283215 .0106192 2.67 0.008 .007503 .04914
_cons -.5377979 .0374675 -14.35 0.000 -.6112517 -.464344

.

THE LOGISTIC:

Code: logistic högretjänsteman utbarXarb arbetarklassbakgrund utbildningsår ålder kvinna andragen arbetslivserfarenhet arbetslivserfarenhet2 tvatusen, robust

Output:

. logistic högretjänsteman utbarXarb arbetarklassbakgrund utbildningsår ålder kvinna andragen	arbet
> slivserfarenhet arbetslivserfarenhet2 tvatusen, robust
Logistic regression Number of obs = 4,735
Wald chi2(9) = 576.86
Prob > chi2 = 0.0000
Log pseudolikelihood = -1652.2071 Pseudo R2 = 0.2066

Robust
högretjänsteman Odds Ratio Std. Err. z P>z [95% Conf. Interval]
utbarXarb 1.045756 .0384651 1.22 0.224 .9730193 1.12393
arbetarklassbakgrund .3136205 .1643819 -2.21 0.027 .1122673 .8761037
utbildningsår 1.434982 .0289727 17.89 0.000 1.379305 1.492906
ålder .997769 .0036922 -0.60 0.546 .9905586 1.005032
kvinna .5874759 .0531224 -5.88 0.000 .4920625 .7013904
andragen .5152386 .1513172 -2.26 0.024 .2897507 .9162044
arbetslivserfarenhet 1.04599 .0097359 4.83 0.000 1.027081 1.065247
arbetslivserfarenhet2 .9995538 .0001637 -2.73 0.006 .9992331 .9998746
tvatusen 1.126824 .1261289 1.07 0.286 .9048535 1.403245
_cons .0010692 .0003931 -18.61 0.000 .0005202 .0021978
Note: _cons estimates baseline odds.

was that more easy to nterpret? My question is regarding the interaction utbarXarb

Comment

Richard Williams

Join Date: Apr 2014

Posts: 4987
#7

15 Sep 2019, 12:26

Hi Karin. This is a little better but still sub-optimal. The way you did it, when there are two or more consecutive spaces every space after the first gets deflected and the output doesn’t line up correctly. As described in pt 12 of the FAQ, you should use code tags. Some people will put in the effort to decipher hard to read output but I tend to not be one of them! Whatever you can do to convey your question clearly is to your benefit.

it appears that you computed the interaction yourself. This is a bad idea. You should use factor variable notation. See

help fvvarlist

More critically, I can”t tell if you included the main effects along with the interaction term. If not you should. You may have, but whatever language you are using for your variables is not one I speak. If you used factor variable notation it would be obvious how you handled the main effects and interactions.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment

Announcement