Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interpretation of interaction between two categorical variables in conditional logistic regression

    I want to study the association between type of drug and outcome across different age groups. I used conditional logistic regression to get odds ratios (ORs) in different age groups.

    Variables:
    outcome (yes,no)
    drug_type (A,B,C,D)
    age_group (1,2,3)
    st (identifier for risk set)

    Model I ran and corresponding output:

    clogit outcome i.drug_type##i.age_group, group(st) or
    --------------------------------------------------------------------------------------------
    outcome | Odds ratio Std. err. z P>|z| [95% conf. interval]
    ---------------------------+----------------------------------------------------------------
    drug_type |
    B | 1.401484 .3490681 1.36 0.175 .8601599 2.28348
    C | 1.580983 .3958329 1.83 0.067 .9678556 2.58252
    D | 1.679786 .4356232 2.00 0.045 1.010438 2.792533
    |
    age_group |
    2 | 1 (omitted)
    3 | 1 (omitted)
    |
    drug_type#age_group |
    B#2 | .5123651 .1357484 -2.52 0.012 .3048304 .8611938
    B#3 | .5191763 .141965 -2.40 0.017 .3037806 .8872985
    C#2 | .4235481 .1130875 -3.22 0.001 .2509756 .7147825
    C#3 | .4872329 .1349303 -2.60 0.009 .2831469 .8384196
    D#2 | .5134837 .1414099 -2.42 0.016 .2993034 .8809306
    D#3 | .5359626 .1530679 -2.18 0.029 .3062218 .9380647


    lincom 2.drug_type + 1.age_group#2.drug_type, or
    ------------------------------------------------------------------------------
    outcome | Odds ratio Std. err. z P>|z| [95% conf. interval]
    -------------+----------------------------------------------------------------
    (1) | 1.401484 .3490681 1.36 0.175 .8601599 2.28348
    ------------------------------------------------------------------------------

    My question:
    1) Is it correct for the lincom command that I used to generate the interpretation of "at age group1, compared with using drug A, using drug B increased the risk of outcome by 40% (i.e., 1.40-1)"?
    2) if not correct, what should be the correct command?

    I want to know:
    at age group1, the odds ratios for drug B vs. drug A; drug C vs. drug A; drug D vs. drug A, regarding the outcome
    at age group2, the odds ratios for drug B vs. drug A; drug C vs. drug A; drug D vs. drug A, regarding the outcome
    at age group3, the odds ratios for drug B vs. drug A; drug C vs. drug A; drug D vs. drug A, regarding the outcome

  • #2
    -clogit- is typically used to analyze matched-tuple case-control designs, so I predicate my discussion on this assumption about your study. I also assume that in your data, the actual numeric values of the drug variable are 1, 2, 3, and 4 with value labels A, B, C, and D attached.

    1) Is it correct for the lincom command that I used to generate the interpretation of "at age group1, compared with using drug A, using drug B increased the risk of outcome by 40% (i.e., 1.40-1)"?
    No, this is incorrect in three respects. First, the metric of both the -clogit- and -lincom- outputs is odds ratios, not risks, and in a case-control design there is no possibility of estimating outcome risks at all, because the "risks" one might otherwise calculate are just artifacts of the sampling and matching design. Second, you are using causal language, which is inappropriate with a case control study. And third, your statement fails to reflect conditioning on the risk set. So a correct statement would be: "Within a given risk set, a positive outcome is associated with a 40% increased odds of exposure to drug B than to drug A."

    2) if not correct, what should be the correct command?
    As stated in my response to your first question, there is no way to estimate risks or risk differences in a case-control design.

    I want to know:
    at age group1, the odds ratios for drug B vs. drug A; drug C vs. drug A; drug D vs. drug A, regarding the outcome
    at age group2, the odds ratios for drug B vs. drug A; drug C vs. drug A; drug D vs. drug A, regarding the outcome
    at age group3, the odds ratios for drug B vs. drug A; drug C vs. drug A; drug D vs. drug A, regarding the outcome
    Code:
    forvalues a = 1/3 {
        forvalues d = 2/4 {
            lincom `d'.drug_type + `d'.drug_type#`a'.age_group, or
        }
    }

    Comment


    • #3
      Clyde Schechter, this part of your response in #2 caught my eye:

      ...and in a case-control design there is no possibility of estimating outcome risks at all, because the "risks" one might otherwise calculate are just artifacts of the sampling and matching design.
      Saying there is "no possibility of estimating outcome risks at all" strikes me as an overstatement, considering the opening paragraph of this CMAJ article (with emphasis added by me):
      Logistic regression analysis, which estimates odds ratios, is often used to adjust for covariables in cohort studies and randomized controlled trials (RCTs) that study a dichotomous outcome. In case–control studies, the odds ratio is the appropriate effect estimate, and the odds ratio can sometimes be interpreted as a risk ratio or rate ratio depending on the sampling method.14 However, in cohort studies and RCTs, odds ratios are often interpreted as risk ratios. This is problematic because an odds ratio always overestimates the risk ratio, and this overestimation becomes larger with increasing incidence of the outcome.5
      You also talked about sampling when you said:

      ...because the "risks" one might otherwise calculate are just artifacts of the sampling and matching design
      But that does not disallow the possibility that under the right (sampling) conditions, the OR from a case-control study estimates the population risk ratio...does it?

      Thanks for clarifying.
      --
      Bruce Weaver
      Email: [email protected]
      Version: Stata/MP 18.5 (Windows)

      Comment


      • #4
        The impossibility of estimating risks here arises from the matched case-control design, not from the use of logistic regression. As you note, with logistic regression, the odds ratio will be a reasonable approximation to the risk ratio if the base risk is close to zero. Even when the base risk is high, in, say, a cohort design, you could separately calculate the base risk of the outcome from your data, convert that to odds, apply the odds ratio, and then convert that back to a risk among the exposed, and then, if you wished, go on to calculate risk ratios or risk differences.

        But the matched case-control design precludes all that. The probability of having the disease outcome in this design is strictly a function of the sampling and matching scheme. If you do, say a 1:4 match, then the probability of having the disease outcome in the data is 20%, and this fact conveys exactly zero information about the population risk of the disease outcome.

        But that does not disallow the possibility that under the right (sampling) conditions, the OR from a case-control study estimates the population risk ratio...does it?

        Well, taking matters farther, one can select a sampling/matching scheme that will produce any pre-specified risk probability between 0 and 1. So, if the correct population risk were known in advance, you could, in principle, sample and match in such a way that the sample risk would match the population risk. Other than that, or sheer coincidence, there is no reason to expect the outcome risk in a matched case-control study to equal, or even approximate, the population risk.

        Comment


        • #5
          Clyde Schechter Thanks for your solution and explanation.
          If you do, say a 1:4 match, then the probability of having the disease outcome in the data is 20%, and this fact conveys exactly zero information about the population risk of the disease outcome.
          I agree this is true for classic case-control design. However, for the "time-matched" nested case-control design, one can still use Inverse Probability Weighting to estimate the absolute risk.

          see: https://pubmed.ncbi.nlm.nih.gov/27734520/
          also Chapter 6 from https://www.routledge.com/Controlled.../9780367186784

          Comment


          • #6
            Yes, a nested case control study, in which time-at-risk information is also available is a different animal, and it is possible to estimate the baseline risk in this situation. But note, also, that that analysis is not done with conditional logistic regression. And it uses information from the cohort study or trial within which the nested case control study is nested, information that is not used in conditional logistic regression.

            Comment

            Working...
            X