Interaction effects in logistic regression analyses

Rens de Visser

Join Date: May 2015

Posts: 53
#1

Interaction effects in logistic regression analyses

20 Jul 2015, 12:58

Hi all,

I'm struggling with comparing coefficients between two logistic regression models. As I have read, this is not easy and even arbitrary in doing so. Instead of comparing coefficients I heard, read and thought about implementing interaction effects.

A short description of my research and what I would to do:
I have panel data from two years, 2004 and 2013 that is. The dependent variable en independent variables are set up exactly similar for both years. Hence, only the sample on which analyses will be conducted differ (although about 30% of the respondents that were present in 2004 are present in 2013 again). I have run two regression models, one for the data of 2004 and one on the data of 2013. Some conclusions can already be drawn, but what I would like to do is compare the effect of each coefficient (i.e. odds ratio) between the two years. I should be doing this by implementing interaction effects: each variable would make an interaction with a time-variable.

My question is how do I handle this?

I already have dummies for the time variables where
t2004 = 1 if year == 2004
t2013 = 1 if year == 2013.
I also creating some interactions but already got stuck on this the following variable is education which has 1 = low, 2 = medium and 3 = high. How do I make interactions with this variable?
gen age_y2004 = age_y * t2004
gen male2004 = male * t2004
gen registeredormarried2004 = registeredormarried * t2004
....

Do I also need to make these interaction with t2013?

Further, what to do once I have all these interactions. How do I incorporate them in my logistic regression analyses?

Help will be appreciated much!

Cheers.
Tags: None
Rens de Visser

Join Date: May 2015

Posts: 53
#2

20 Jul 2015, 14:38

Ok. I think I have done the right thing in making the interactions. Now I would like to add them to my logistic regression. See below:
1) contains all interactions in one model (if this one is right, is it correct to still use the "i."-term for categorical variables?)
2) contains only one interaction at the time (so for each interaction I make a new model).

Which is right, if there already is one right?

1) logistic ltcuse age_y2004 age_y2013 age_y male2004 male2013 male registeredormarried2004 registeredormarried2013 registeredormarried hospitalizationonce2004 hospitalizationonce2013 hospitalization2 educationmedium2004 educationmedium2013 educationhigh2004 educationhigh2013 i.education jobsituaitonunemployed2004 jobsituationunemployed2013 jobsituationretired2004 jobsituationretired2013 i.jobsituation borninothercountry2004 borninothercountry2013 borninothercountry income2004 income2013 income assets2004 assets2013 assets livingincity2004 livingincity2013 livingincity havingkids2004 havingkids2013 havingkids perceivedhealthisfair2004 perceivedhealthisfair2013 perceivedhealthisgood2004 perceivedhealthisgood2013 perceivedhealthisexcellent2004 perceivedhealthisexcellent2013 i.perceivedhealth adl2004 adl2013 adl iadl2004 iadl2013 iadl mobility2004 mobility2013 mobility chronic2ormore2004 chronic2ormore2013 chronic2ormore eurodcat2004 eurodcat2013 eurodcat

2) logistic ltcuse age_y male2004 male2013 male registeredormarried i.hospitalization2 i.education i.jobsituation borninothercountry income assets livingincity havingkids i.perceivedhealth adl iadl mobility chronic2ormore eurodcat if t == 2004

Your help will be appreciated.
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3460
#3

21 Jul 2015, 03:32

You can make the interaction terms yourself, but normally you should not so. Stata post-estimation commands need to know which variables are interactions, and if you make them yourself, that information is not available to Stata. Instead you should use the factor variable notation to make the interactions. Here is an example:

Code:

sysuse nlsw88, clear logit union i.south##i.age

Interaction effects are hard (as you are finding out), so don't overdo it. I would not add all the interactions in your model; the resulting model will in all likelihood be too complicated to communicate to your audience. I would not add the interactions one at the time, that sounds too much like data dredging to me. A good model is a model that simplifies reality but still answers your question. So take a step back and look at your research question again, and if necessary start refining it. Then come up with a list of a few variables that your really care about, and add interactions with those.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Rens de Visser

Join Date: May 2015

Posts: 53
#4

21 Jul 2015, 03:57

As according to one of your articles I created to following:

gen baseline = 1
xi: logit ltcuse i.t*age_y i.t*male i.t*registeredormarried i.t*i.hospitalization2 i.t*i.education i.t*jobsituation i.t*borninothercountry i.t*income i.t*assets i.t*livingincity i.t*havingkids i.t*i.perceivedhealth i.t*adl i.t*iadl i.t*mobility i.t*chronic2ormore i.t*eurodcat baseline, or nocons nolog

This gives me a huge table. If I understand you correctly, you suggest to delete some of the variables to make it more doable?

Here's (part of) the output http://postimg.org/image/v5o34pu59/

Last edited by Rens de Visser; 21 Jul 2015, 04:22.
Comment
Rens de Visser

Join Date: May 2015

Posts: 53
#5

21 Jul 2015, 04:59

(the next part of the output http://postimg.org/image/uhoiotrlv/)

If I for example interpret the results for age, gender (1 = male) and education, is this interpretation correct?
- The effect of age for respondents in 2013 is 1.015 times that for respondents in 2004. That means that over time, comparing 2004 with 2013, age has become a stronger predictor for LTC use.
- The odds of using LTC is 0.836 times higher for men in 2013 compared to men in 2004, meaning that men are less likely to use LTC in 2013 then they were in 2004. The effect has become weaker.
- The odds ratio for the medium educated in 2013 is 1.137, which means that the odds of using LTC is 1.137 times higher for those in 2013 compared to the medium educated in 2004. The odds ratio for the high educated in 2013 is 1.477, which means that the odds of using LTC is 1.477 times higher for those in 2013 compared to the high educated in 2004.
Comment
Jorrit Gosens

Join Date: Jan 2015

Posts: 1019
#6

21 Jul 2015, 05:05

Try and use code delimiters to post output. See FAQ nr 12 how to do so.
Also, what I believe Maarten is also trying to tell you, is not to delete some variables, but to include precisely only those (interaction) effects that you are interested in. In you other post on this analysis you hinted years might say something about policy changes having effects. Did this policy aim to make ltcuse more available to low income and immigrants? Than include those interactions, and leave out the rest.
Also, use of the 'nocons' option suggests that you have some expectation that females aged 0, with no income etc.. will not use ltc. Unless you are willing to explain that choice I would not include the nocons option.
Comment
Jorrit Gosens

Join Date: Jan 2015

Posts: 1019
#7

21 Jul 2015, 05:41

Also, it now appears you have 4, not 2 moments of observation. This makes it less sensible to use 'i.' prefix (which creates binary dummies for each value) but rather use 'c.' prefix, which informs Stata that the variable is continuous, as time in most analyses should be. Compare, for example, the example Maarten posted in #3, with the results from below:

Code:

sysuse nlsw88, clear logit union i.south##c.age

This tells you whether 'age' has a different effect in 'south', rather than whether being 35 yrs has a different effect in 'south'.
Comment
Rens de Visser

Join Date: May 2015

Posts: 53
#8

21 Jul 2015, 06:08

Originally posted by Jorrit Gosens View Post

Try and use code delimiters to post output. See FAQ nr 12 how to do so.
Also, what I believe Maarten is also trying to tell you, is not to delete some variables, but to include precisely only those (interaction) effects that you are interested in. In you other post on this analysis you hinted years might say something about policy changes having effects. Did this policy aim to make ltcuse more available to low income and immigrants? Than include those interactions, and leave out the rest.
Also, use of the 'nocons' option suggests that you have some expectation that females aged 0, with no income etc.. will not use ltc. Unless you are willing to explain that choice I would not include the nocons option.

The 'nocons' option I just pasted from somewhere else. The dataset only contains persons aged 41 years old and older. Is that something I just take into account?
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3460
#9

21 Jul 2015, 06:20

I see two problems with your code: You should not use the -xi- prefix anymore, instead you must use the factor variable notation. Also it is no longer necessary to trick Stata into presenting the baseline odds, it now does so automatically.

What I suggested is not that you remove variables but that you think about which interactions you want to include. As you have noted just including all interactions gets you way too much parameters to be useful. It is usually best to close Stata when you do this, and just reread the literature and reconsider your research question and your hypotheses.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Jorrit Gosens

Join Date: Jan 2015

Posts: 1019
#10

21 Jul 2015, 06:21

actually i meant to suggest using 'c.t' instead of 'i.t' for the interaction terms.

Code:

logit ltcuse c.t##age_y

This will give you developments of the effect of variables over time, rather than comparing effects in each year with those in 2004.

Last edited by Jorrit Gosens; 21 Jul 2015, 06:29.
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3460
#11

21 Jul 2015, 06:24

Both i.t and c.t will result in a comparison of effects over time. In c.t the effect of age_y is assumed to change linearly over time, which is often not reasonable. Since Rens has only 4 time points using i.t is probably safer.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Rens de Visser

Join Date: May 2015

Posts: 53
#12

21 Jul 2015, 06:50

Thanks for all your comments so far guys. Appreciate it and are really helpful.
Comment

Rens de Visser

Join Date: May 2015
Posts: 53

#13

21 Jul 2015, 09:38

Now that I have added the interaction variables. I get these results (copied from WORD-file). The numbers reported are odds ratios. I would like to know if my below mentioned description of these variables is correct.

Married or registered p.	0.336***
Married or registered p. * 2013	1.239***
Hospitalization once	2.096***
Hospitalization once * 2013	0.875*
Hospitalization > once	2.772***
Hospitalization > once * 2013	0.837 (p = 0.071)

"Those married or registered as partners are less likely to use LTC than those who are not, but the strength of this effect has become weaker in 2013 compared to 2004. The odds for using LTC in 2013 are 1.239 times the odds (0.336) for using LTC in 2014."

"The effect of recent hospitalization has also weakend in 2013 compared to 2004. The likelihood for LTC utilization rises for those that have been hospitalized in the past 12 months once or more than once, but this effect is stronger in 2004 than it is in 2013. Furthermore, the effect of past hospitalization more than once in 2013 has not been changed significantly compared with 2004 (p > 0.05).

Last edited by Rens de Visser; 21 Jul 2015, 10:00.

Comment

Rens de Visser

Join Date: May 2015

Posts: 53
#14

21 Jul 2015, 12:36

Maarten? Jorrit? Anyone ?
Comment
Jorrit Gosens

Join Date: Jan 2015

Posts: 1019
#15

22 Jul 2015, 00:38

Yes, that is how you should read these results.
Comment

Announcement