Interpretation of interaction between two categorical variables in conditional logistic regression

Zihan Dong

Join Date: Feb 2021

Posts: 44
#1

Interpretation of interaction between two categorical variables in conditional logistic regression

05 Sep 2023, 14:08

I want to study the association between type of drug and outcome across different age groups. I used conditional logistic regression to get odds ratios (ORs) in different age groups.

Variables:
outcome (yes,no)
drug_type (A,B,C,D)
age_group (1,2,3)
st (identifier for risk set)

Model I ran and corresponding output:

clogit outcome i.drug_type##i.age_group, group(st) or
--------------------------------------------------------------------------------------------
outcome | Odds ratio Std. err. z P>|z| [95% conf. interval]
---------------------------+----------------------------------------------------------------
drug_type |
B | 1.401484 .3490681 1.36 0.175 .8601599 2.28348
C | 1.580983 .3958329 1.83 0.067 .9678556 2.58252
D | 1.679786 .4356232 2.00 0.045 1.010438 2.792533
|
age_group |
2 | 1 (omitted)
3 | 1 (omitted)
|
drug_type#age_group |
B#2 | .5123651 .1357484 -2.52 0.012 .3048304 .8611938
B#3 | .5191763 .141965 -2.40 0.017 .3037806 .8872985
C#2 | .4235481 .1130875 -3.22 0.001 .2509756 .7147825
C#3 | .4872329 .1349303 -2.60 0.009 .2831469 .8384196
D#2 | .5134837 .1414099 -2.42 0.016 .2993034 .8809306
D#3 | .5359626 .1530679 -2.18 0.029 .3062218 .9380647

lincom 2.drug_type + 1.age_group#2.drug_type, or
------------------------------------------------------------------------------
outcome | Odds ratio Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
(1) | 1.401484 .3490681 1.36 0.175 .8601599 2.28348
------------------------------------------------------------------------------

My question:
1) Is it correct for the lincom command that I used to generate the interpretation of "at age group1, compared with using drug A, using drug B increased the risk of outcome by 40% (i.e., 1.40-1)"?
2) if not correct, what should be the correct command?

I want to know:
at age group1, the odds ratios for drug B vs. drug A; drug C vs. drug A; drug D vs. drug A, regarding the outcome
at age group2, the odds ratios for drug B vs. drug A; drug C vs. drug A; drug D vs. drug A, regarding the outcome
at age group3, the odds ratios for drug B vs. drug A; drug C vs. drug A; drug D vs. drug A, regarding the outcome
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#2

05 Sep 2023, 15:03

-clogit- is typically used to analyze matched-tuple case-control designs, so I predicate my discussion on this assumption about your study. I also assume that in your data, the actual numeric values of the drug variable are 1, 2, 3, and 4 with value labels A, B, C, and D attached.

1) Is it correct for the lincom command that I used to generate the interpretation of "at age group1, compared with using drug A, using drug B increased the risk of outcome by 40% (i.e., 1.40-1)"?

No, this is incorrect in three respects. First, the metric of both the -clogit- and -lincom- outputs is odds ratios, not risks, and in a case-control design there is no possibility of estimating outcome risks at all, because the "risks" one might otherwise calculate are just artifacts of the sampling and matching design. Second, you are using causal language, which is inappropriate with a case control study. And third, your statement fails to reflect conditioning on the risk set. So a correct statement would be: "Within a given risk set, a positive outcome is associated with a 40% increased odds of exposure to drug B than to drug A."

2) if not correct, what should be the correct command?

As stated in my response to your first question, there is no way to estimate risks or risk differences in a case-control design.

I want to know:
at age group1, the odds ratios for drug B vs. drug A; drug C vs. drug A; drug D vs. drug A, regarding the outcome
at age group2, the odds ratios for drug B vs. drug A; drug C vs. drug A; drug D vs. drug A, regarding the outcome
at age group3, the odds ratios for drug B vs. drug A; drug C vs. drug A; drug D vs. drug A, regarding the outcome

Code:

forvalues a = 1/3 { forvalues d = 2/4 { lincom `d'.drug_type + `d'.drug_type#`a'.age_group, or } }
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1132
#3

05 Sep 2023, 15:41

Clyde Schechter, this part of your response in #2 caught my eye:

...and in a case-control design there is no possibility of estimating outcome risks at all, because the "risks" one might otherwise calculate are just artifacts of the sampling and matching design.

Saying there is "no possibility of estimating outcome risks at all" strikes me as an overstatement, considering the opening paragraph of this CMAJ article (with emphasis added by me):
Logistic regression analysis, which estimates odds ratios, is often used to adjust for covariables in cohort studies and randomized controlled trials (RCTs) that study a dichotomous outcome. In case–control studies, the odds ratio is the appropriate effect estimate, and the odds ratio can sometimes be interpreted as a risk ratio or rate ratio depending on the sampling method.1^–4 However, in cohort studies and RCTs, odds ratios are often interpreted as risk ratios. This is problematic because an odds ratio always overestimates the risk ratio, and this overestimation becomes larger with increasing incidence of the outcome.5
You also talked about sampling when you said:

...because the "risks" one might otherwise calculate are just artifacts of the sampling and matching design

But that does not disallow the possibility that under the right (sampling) conditions, the OR from a case-control study estimates the population risk ratio...does it?

Thanks for clarifying.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#4

05 Sep 2023, 15:57

The impossibility of estimating risks here arises from the matched case-control design, not from the use of logistic regression. As you note, with logistic regression, the odds ratio will be a reasonable approximation to the risk ratio if the base risk is close to zero. Even when the base risk is high, in, say, a cohort design, you could separately calculate the base risk of the outcome from your data, convert that to odds, apply the odds ratio, and then convert that back to a risk among the exposed, and then, if you wished, go on to calculate risk ratios or risk differences.

But the matched case-control design precludes all that. The probability of having the disease outcome in this design is strictly a function of the sampling and matching scheme. If you do, say a 1:4 match, then the probability of having the disease outcome in the data is 20%, and this fact conveys exactly zero information about the population risk of the disease outcome.

But that does not disallow the possibility that under the right (sampling) conditions, the OR from a case-control study estimates the population risk ratio...does it?

Well, taking matters farther, one can select a sampling/matching scheme that will produce any pre-specified risk probability between 0 and 1. So, if the correct population risk were known in advance, you could, in principle, sample and match in such a way that the sample risk would match the population risk. Other than that, or sheer coincidence, there is no reason to expect the outcome risk in a matched case-control study to equal, or even approximate, the population risk.
Comment
Zihan Dong

Join Date: Feb 2021

Posts: 44
#5

05 Sep 2023, 16:32

Clyde Schechter Thanks for your solution and explanation.

If you do, say a 1:4 match, then the probability of having the disease outcome in the data is 20%, and this fact conveys exactly zero information about the population risk of the disease outcome.

I agree this is true for classic case-control design. However, for the "time-matched" nested case-control design, one can still use Inverse Probability Weighting to estimate the absolute risk.

see: https://pubmed.ncbi.nlm.nih.gov/27734520/
also Chapter 6 from https://www.routledge.com/Controlled.../9780367186784
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#6

05 Sep 2023, 17:05

Yes, a nested case control study, in which time-at-risk information is also available is a different animal, and it is possible to estimate the baseline risk in this situation. But note, also, that that analysis is not done with conditional logistic regression. And it uses information from the cohort study or trial within which the nested case control study is nested, information that is not used in conditional logistic regression.
Comment

Announcement

Interpretation of interaction between two categorical variables in conditional logistic regression

Comment

Comment

Comment

Comment

Comment