Interaction of categorical variables in a logistic regression using national survey data

Vijay Vasudevan

Join Date: Aug 2015

Posts: 7
#1

Interaction of categorical variables in a logistic regression using national survey data

08 Feb 2016, 08:52

Good morning, I am running a logistic regression that uses interaction between categorical variables (for example, presence of chronic disease (y/n) and disability status (7 mutually exclusive disabilities). We are using MEPS data. The syntax that I have used for interaction is Chronic_Disease##Disability_Status. My colleagues and I are hoping to report adjusted odds ratios; however the analysis is not giving AORs. We are using version 14.

Any help is greatly appreciated.
Vijay
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#2

08 Feb 2016, 09:09

More information is needed to troubleshoot this. Telling us what you aren't getting doesn't tell us what you did get. Please show us the exact commands you used and the exact response you got from Stata. Do this by copying directly from your Results window or your Stata log file and pasting into a code block (see FAQ #12 for how to set up a code block) on the Forum. Do not retype or edit anything: the devil is often in the details.

By the way, even though the column header in the output of -logistic- is labeled "Odds Ratio", the effect of an interaction term is never an odds ratio, it is a ratio of odds ratios (ROR). But you should be able to get these ROR's if your data and commands are suitable.

Finally, though it probably doesn't directly bear in your question, MEPS data is gathered using a complex survey design, I believe. If I"m right about that, you need to appropriately -svyset- your data and use -svy:- prefixes to get correct results.
Comment
Vijay Vasudevan

Join Date: Aug 2015

Posts: 7
#3

08 Feb 2016, 10:08

Here is my code and output (see attached pdf). You are correct. We did use survey weights and used the svy prefix. All the variables are categorical. Thank you again for your help.

Code:

svy, subpop (if AGE>17.99&BMINDX53<80) : logistic EXRCIS53 SEX i.RaceRecode EmploymentStatus HS_Grad i.Income MiddleOldAge i.OBESE i.INSCOV AnyCD HAVEUS42 PA_Recode i.DISABILITY DISABILITY##MiddleOldAge DISABILITY##OBESE DISABILITY##AnyCD DISABILITY##HAVEUS42 DISABILITY##PA_Recode

Attached Files

Stata output.pdf (44.3 KB, 1 view)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#4

08 Feb 2016, 10:29

I don't see the problem. You have output for all 6 levels of DISABILITY#AnyCD, and the numbers look like perfectly good adjusted RORs. What else were you expecting?

By the way, you should remove the AnyCD HAVEUS42 MiddleOldAge and PA_Recode variables from your command. Because those are included in ## interaction terms, Stata will include those main effects automatically. So they are unnecessary. But they are also causing a problem because having specified them without i., Stata is entering them as continuous variables rather than as factor variables, and then dropping the 1.AnyCD (etc.) versions of them due to collinearity with the AnyCD version. This will probably lead to difficulties later if you want to use -margins- to do interesting postestimation calculations. (Also if any of these variables is not dichotomous, it is a mis-specification!)
Comment
Vijay Vasudevan

Join Date: Aug 2015

Posts: 7
#5

08 Feb 2016, 11:29

Thank you for your help. We were trying to see if there was a way to have Stata report the odds ratios for the different interactions. For example, the odds ratio for having disability 1 and having a chronic disease is XXXXX compared to the reference group (no disability, no chronic disease). If this is not feasible, then we will report the adjusted ROR.
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#6

08 Feb 2016, 15:40

I'm wondering whether margins and marginsplot woudn't fit your needs.

Best regards,

Marcos
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#7

08 Feb 2016, 17:09

Yes, Marcos is absolutely right. What you are looking for will come from -margins-. But it is a little tricky, because the default outcome for -margins- after -logistic- is not odds, or odds ratios, but predicted probability. So, for example,

Code:

margins DISABILITY, at(AnyCD =(1))

will get you the predicted probability of EXRCIS53 for each of the 7 levels of disability conditional on having a chronic disease.

You could get the corresponding odds of EXRCIS53 with

Code:

margins DISABILITY, at(AnyCD = (1)) expression(exp(predict(xb)))

But that is also not what you want.

To get odds ratios relative to the reference category you first have to get the linear predictor value for your reference category (which you describe as AnyCD = 0 and DISABILITY = 0. Then you want to get exp(xb for each disability category with chronic disease) / exp(xb for reference category). Since a ratio of exponentials is the exponential of the difference, whole thing is this:

Code:

// CALCULATE xb FOR REFERENCE CATEGORY margins, at(DISABILITY = (0) AnyCD = (0)) predict(xb) // STORE THE RESULT IN A LOCAL MACRO matrix B = r(b) local ref_xb = B[1, 1] // NOW GET ODDS RATIOS FOR LEVELS OF DISABILITY WHEN AnyCD = 1 margins DISABILITY, at(AnyCD = (1)) expression(exp(predict(xb) - `ref_xb'))

These will be the adjusted ORs (not RORs) you seek.

In the days before -margins-, you would calculate these using the -nlcom- command, and you still can. Given the multiple steps involved to get these particular results our of -margins-, the old-fashioned way is possibly easier:

Code:

forvalues i = 1/6 { nlcom exp(_b[1.AnyCD] + _b[`i'.DISABILITY] + _b[`i'.DISABILITY#1.AnyCD]) }
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#8

09 Feb 2016, 13:32

Clyde gave great insights to the matter, commented on pitfalls and provided a helpful solution for the query, so as to display the ORs for the interaction terms.

Now, this is just a digression on the subject, for the sake of taking profit of this interesting subject: Stoic as I may, I usually tend to stick to the mainstream roaster of possibilities and commands (fair enough) recommended by referencial books and softwares, respectively. For at least a few reasons,

There is always this "tricky" aspect of mind "confounding and not easily interpreting" interaction terms under a survey design using logistical regression (cf. Heering, West, Berglund. Applied Survey Data Analysis, CRC Press, 2010, p 243), plus the (not incidental, I gather) lack of examples with interaction terms in the Stata Survey Data Reference Manual (http://www.stata.com/manuals13/svy.pdf). Instead of interactions, it shows the - margins - command with the option vce(unconditional) for the estimations of the influence of the predictors.

There is also the issue over potential "limitations of the odds ratio" in terms of not coping with the "constant change in the probability" under models for binary outcomes, hence some preference for predicted probabilities (cf. Long and Freese. Regression Models for Categorical Dependent Variables using Stata. Stata Press, 2014, p.234-235).

By the way, in #3 we see there are several interaction terms in the command line, what prompts the interpretation to become even more challenging. .

I therefore speculate - perhaps I'm wrong - that these are some of the reasons for us the prefer in Stata to present the interaction effects from logistic regression under the encompassing - margins - command as predicted probabilities.

Kind regards,

Marcos

Best regards,

Marcos
Comment
Vijay Vasudevan

Join Date: Aug 2015

Posts: 7
#9

10 Feb 2016, 10:10

This is very helpful. I believe the margins command will be most appropriate as it will allow for us to get the AORs compared to a reference group.

Thank you both very much!
Vijay
Comment

Announcement

Interaction of categorical variables in a logistic regression using national survey data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment