Matched-odds ratio for variables with more than two categories

Llorenç Quintó

Join Date: Apr 2014
Posts: 11

Matched-odds ratio for variables with more than two categories

15 Nov 2016, 10:02

Dear Statalist members,

I need someone to help me in the interpretation and manual calculation of the matched-odds ratio (mOR) for variables with more than two categories.

In matched case-control studies, only discordant pairs are used for the calculation of matched odds ratio when the variable is binary. When the variable has more than two categories (1, 2, 3, ... k), we can construct subtables 2x2 by selecting the categories to be compared and use also the number of discordant pairs. For example, the mOR for 2 vs. 1 can be calculated as follows: number of cases in category 2 matched to controls in category 1 divided by the number of cases in category 1 matched to controls in category 2. I have some doubts, however, about if this mOR depends also on the number of discordant pairs between the category being compared and the other categories (2 vs. 3, ... 2 vs. k in the example).

As an example we use the Hosmer-Lemeshow dataset with data on matched pairs of infants with low birthweight (cases) or with regular birthweight (controls). We fit the following conditional logistic model of low birthweight on mother’s race:

Model 1:

Code:

. webuse lowbirth2, clear
(Applied Logistic Regression, Hosmer & Lemeshow)
 
. clogit low i.race , group(pairid) nolog or
 
Conditional (fixed-effects) logistic regression
 
                                                Number of obs     =        112
                                                LR chi2(2)        =       0.06
                                                Prob > chi2       =     0.9714
Log likelihood = -38.787243                     Pseudo R2         =     0.0007
 
------------------------------------------------------------------------------
         low | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        race |
      black  |   1.090951   .5709087     0.17   0.868     .3911654    3.042636
      other  |   .9714162   .3854501    -0.07   0.942     .4463293    2.114245
------------------------------------------------------------------------------

The mOR for race of black and race of other when compared with the race of white omitted group are 1.09 and 0.97 respectively. However, these values are different from those we get if we select the categories we want to compare in the following models:

Model 2 (black vs white):

Code:

. clogit low i.race if inlist(race,1,2), group(pairid) nolog or
note: 33 groups (33 obs) dropped because of all positive or
      all negative outcomes.
 
Conditional (fixed-effects) logistic regression
 
                                                Number of obs     =         32
                                                LR chi2(1)        =       0.14
                                                Prob > chi2       =     0.7050
Log likelihood = -11.018681                     Pseudo R2         =     0.0065
 
------------------------------------------------------------------------------
         low | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        race |
      black  |   1.333333    1.01835     0.38   0.706     .2984165    5.957371
------------------------------------------------------------------------------

Model 3 (other vs white):

Code:

. clogit low i.race if inlist(race,1,3), group(pairid) nolog or
note: 19 groups (19 obs) dropped because of all positive or
      all negative outcomes.
 
Conditional (fixed-effects) logistic regression
 
                                                Number of obs     =         72
                                                LR chi2(1)        =       0.05
                                                Prob > chi2       =     0.8272
Log likelihood =  -24.92948                     Pseudo R2         =     0.0010
 
------------------------------------------------------------------------------
         low | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        race |
      other  |   .9090909   .3972104    -0.22   0.827     .3860918    2.140543
------------------------------------------------------------------------------

The mORs are now 1.33 and 0.91

Obviously, the samples used in these two models are smaller than that used in the first model due to the selection of the categories we want to compare (1 vs 2, 1 vs 3). In fact, when there is no matching, the ORs obtained with model 1 coincide with those of models 2 and 3, since despite being estimated with smaller samples, they use the observations we need for the comparisons with the reference group. By contrast, due to the matched desing, we also exclude cases or controls that becomes unpaired due to the selection, even when they belong to the categories selected for the model. However, I think this should not affect the value of the mOR since each case is compared with its matched control and, therefore, these observations would not be used because their matched-pair does not belong to either of the two categories compared in the model.

Let us now calculate the mORs manually using the number of discordant pairs. For this we need to reshape the dataset:

Code:

. keep pairid low race
 
. reshape wide race, i(pairid) j(low)
(note: j = 0 1)
 
Data                               long   ->   wide
-----------------------------------------------------------------------------
Number of obs.                      112   ->      56
Number of variables                   3   ->       3
j variable (2 values)               low   ->   (dropped)
xij variables:
                                   race   ->   race0 race1
-----------------------------------------------------------------------------
 
. tab race1 race0
 
           |              0 race
    1 race |     white      black      other |     Total
-----------+---------------------------------+----------
     white |         8          3         11 |        22
     black |         4          1          6 |        11
     other |        10          6          7 |        23
-----------+---------------------------------+----------
     Total |        22         10         24 |        56

The mOR are 4/3 (= 1.33) for black vs white and 10/11 (= 0.91) for other vs white, which coincide with those estimated with models 2 and 3. In the table above there are many pairs that do not contribute to the calculation of mOR we are interested in. As we already know, some of them are the concordant pairs (located on the diagonal), but also not used are the 12 discordant pairs in the categories that are not compared between them (6+6 black vs others).

What happens if we exclude the discrepant pairs in non-compared categories (black vs. other) and re estimate the model 1?

Code:

. drop if race1 == 2 & race0 == 3 | race1 == 3 & race0 == 2
(12 observations deleted)
 
. reshape long race, i(pairid) j(low)
(note: j = 0 1)
 
Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                       44   ->      88
Number of variables                   3   ->       3
j variable (2 values)                     ->   low
xij variables:
                            race0 race1   ->   race
-----------------------------------------------------------------------------
 
. clogit low i.race , group(pairid) nolog or
 
Conditional (fixed-effects) logistic regression
 
                                                Number of obs     =         88
                                                LR chi2(2)        =       0.19
                                                Prob > chi2       =     0.9089
Log likelihood = -30.402984                     Pseudo R2         =     0.0031
 
------------------------------------------------------------------------------
         low | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        race |
      black  |   1.333333    1.01835     0.38   0.706     .2984165    5.957371
      other  |   .9090909   .3972104    -0.22   0.827     .3860918    2.140543
------------------------------------------------------------------------------

The mORs are the same as those obtained with models 2 and 3 and those calculated manually.

My conclusion, probably wrong, is that the mORs of model 1 should not be interpreted as a comparison of each category with the reference category, since there is some kind of weighting that takes into account the number of discrepant pairs between that category and all the others categories, not only the reference one.

I would appreciate someone helping me to to understand where I am wrong in all this reasoning and decide between the estimation of the mOR with only one model or separate models with selected categories. Also I would like to know the formula for manual calculation of matched odds ratio for a variable with more than two categories to reproduce values in model 1.

Thank you very much!
Llorenç

Tags: None

Announcement

Matched-odds ratio for variables with more than two categories