Dear Statalist members,
I need someone to help me in the interpretation and manual calculation of the matched-odds ratio (mOR) for variables with more than two categories.
In matched case-control studies, only discordant pairs are used for the calculation of matched odds ratio when the variable is binary. When the variable has more than two categories (1, 2, 3, ... k), we can construct subtables 2x2 by selecting the categories to be compared and use also the number of discordant pairs. For example, the mOR for 2 vs. 1 can be calculated as follows: number of cases in category 2 matched to controls in category 1 divided by the number of cases in category 1 matched to controls in category 2. I have some doubts, however, about if this mOR depends also on the number of discordant pairs between the category being compared and the other categories (2 vs. 3, ... 2 vs. k in the example).
As an example we use the Hosmer-Lemeshow dataset with data on matched pairs of infants with low birthweight (cases) or with regular birthweight (controls). We fit the following conditional logistic model of low birthweight on mother’s race:
Model 1:
The mOR for race of black and race of other when compared with the race of white omitted group are 1.09 and 0.97 respectively. However, these values are different from those we get if we select the categories we want to compare in the following models:
Model 2 (black vs white):
Model 3 (other vs white):
The mORs are now 1.33 and 0.91
Obviously, the samples used in these two models are smaller than that used in the first model due to the selection of the categories we want to compare (1 vs 2, 1 vs 3). In fact, when there is no matching, the ORs obtained with model 1 coincide with those of models 2 and 3, since despite being estimated with smaller samples, they use the observations we need for the comparisons with the reference group. By contrast, due to the matched desing, we also exclude cases or controls that becomes unpaired due to the selection, even when they belong to the categories selected for the model. However, I think this should not affect the value of the mOR since each case is compared with its matched control and, therefore, these observations would not be used because their matched-pair does not belong to either of the two categories compared in the model.
Let us now calculate the mORs manually using the number of discordant pairs. For this we need to reshape the dataset:
The mOR are 4/3 (= 1.33) for black vs white and 10/11 (= 0.91) for other vs white, which coincide with those estimated with models 2 and 3. In the table above there are many pairs that do not contribute to the calculation of mOR we are interested in. As we already know, some of them are the concordant pairs (located on the diagonal), but also not used are the 12 discordant pairs in the categories that are not compared between them (6+6 black vs others).
What happens if we exclude the discrepant pairs in non-compared categories (black vs. other) and re estimate the model 1?
The mORs are the same as those obtained with models 2 and 3 and those calculated manually.
My conclusion, probably wrong, is that the mORs of model 1 should not be interpreted as a comparison of each category with the reference category, since there is some kind of weighting that takes into account the number of discrepant pairs between that category and all the others categories, not only the reference one.
I would appreciate someone helping me to to understand where I am wrong in all this reasoning and decide between the estimation of the mOR with only one model or separate models with selected categories. Also I would like to know the formula for manual calculation of matched odds ratio for a variable with more than two categories to reproduce values in model 1.
Thank you very much!
Llorenç
I need someone to help me in the interpretation and manual calculation of the matched-odds ratio (mOR) for variables with more than two categories.
In matched case-control studies, only discordant pairs are used for the calculation of matched odds ratio when the variable is binary. When the variable has more than two categories (1, 2, 3, ... k), we can construct subtables 2x2 by selecting the categories to be compared and use also the number of discordant pairs. For example, the mOR for 2 vs. 1 can be calculated as follows: number of cases in category 2 matched to controls in category 1 divided by the number of cases in category 1 matched to controls in category 2. I have some doubts, however, about if this mOR depends also on the number of discordant pairs between the category being compared and the other categories (2 vs. 3, ... 2 vs. k in the example).
As an example we use the Hosmer-Lemeshow dataset with data on matched pairs of infants with low birthweight (cases) or with regular birthweight (controls). We fit the following conditional logistic model of low birthweight on mother’s race:
Model 1:
Code:
. webuse lowbirth2, clear (Applied Logistic Regression, Hosmer & Lemeshow) . clogit low i.race , group(pairid) nolog or Conditional (fixed-effects) logistic regression Number of obs = 112 LR chi2(2) = 0.06 Prob > chi2 = 0.9714 Log likelihood = -38.787243 Pseudo R2 = 0.0007 ------------------------------------------------------------------------------ low | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- race | black | 1.090951 .5709087 0.17 0.868 .3911654 3.042636 other | .9714162 .3854501 -0.07 0.942 .4463293 2.114245 ------------------------------------------------------------------------------
Model 2 (black vs white):
Code:
. clogit low i.race if inlist(race,1,2), group(pairid) nolog or note: 33 groups (33 obs) dropped because of all positive or all negative outcomes. Conditional (fixed-effects) logistic regression Number of obs = 32 LR chi2(1) = 0.14 Prob > chi2 = 0.7050 Log likelihood = -11.018681 Pseudo R2 = 0.0065 ------------------------------------------------------------------------------ low | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- race | black | 1.333333 1.01835 0.38 0.706 .2984165 5.957371 ------------------------------------------------------------------------------
Code:
. clogit low i.race if inlist(race,1,3), group(pairid) nolog or note: 19 groups (19 obs) dropped because of all positive or all negative outcomes. Conditional (fixed-effects) logistic regression Number of obs = 72 LR chi2(1) = 0.05 Prob > chi2 = 0.8272 Log likelihood = -24.92948 Pseudo R2 = 0.0010 ------------------------------------------------------------------------------ low | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- race | other | .9090909 .3972104 -0.22 0.827 .3860918 2.140543 ------------------------------------------------------------------------------
Obviously, the samples used in these two models are smaller than that used in the first model due to the selection of the categories we want to compare (1 vs 2, 1 vs 3). In fact, when there is no matching, the ORs obtained with model 1 coincide with those of models 2 and 3, since despite being estimated with smaller samples, they use the observations we need for the comparisons with the reference group. By contrast, due to the matched desing, we also exclude cases or controls that becomes unpaired due to the selection, even when they belong to the categories selected for the model. However, I think this should not affect the value of the mOR since each case is compared with its matched control and, therefore, these observations would not be used because their matched-pair does not belong to either of the two categories compared in the model.
Let us now calculate the mORs manually using the number of discordant pairs. For this we need to reshape the dataset:
Code:
. keep pairid low race . reshape wide race, i(pairid) j(low) (note: j = 0 1) Data long -> wide ----------------------------------------------------------------------------- Number of obs. 112 -> 56 Number of variables 3 -> 3 j variable (2 values) low -> (dropped) xij variables: race -> race0 race1 ----------------------------------------------------------------------------- . tab race1 race0 | 0 race 1 race | white black other | Total -----------+---------------------------------+---------- white | 8 3 11 | 22 black | 4 1 6 | 11 other | 10 6 7 | 23 -----------+---------------------------------+---------- Total | 22 10 24 | 56
What happens if we exclude the discrepant pairs in non-compared categories (black vs. other) and re estimate the model 1?
Code:
. drop if race1 == 2 & race0 == 3 | race1 == 3 & race0 == 2 (12 observations deleted) . reshape long race, i(pairid) j(low) (note: j = 0 1) Data wide -> long ----------------------------------------------------------------------------- Number of obs. 44 -> 88 Number of variables 3 -> 3 j variable (2 values) -> low xij variables: race0 race1 -> race ----------------------------------------------------------------------------- . clogit low i.race , group(pairid) nolog or Conditional (fixed-effects) logistic regression Number of obs = 88 LR chi2(2) = 0.19 Prob > chi2 = 0.9089 Log likelihood = -30.402984 Pseudo R2 = 0.0031 ------------------------------------------------------------------------------ low | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- race | black | 1.333333 1.01835 0.38 0.706 .2984165 5.957371 other | .9090909 .3972104 -0.22 0.827 .3860918 2.140543 ------------------------------------------------------------------------------
My conclusion, probably wrong, is that the mORs of model 1 should not be interpreted as a comparison of each category with the reference category, since there is some kind of weighting that takes into account the number of discrepant pairs between that category and all the others categories, not only the reference one.
I would appreciate someone helping me to to understand where I am wrong in all this reasoning and decide between the estimation of the mOR with only one model or separate models with selected categories. Also I would like to know the formula for manual calculation of matched odds ratio for a variable with more than two categories to reproduce values in model 1.
Thank you very much!
Llorenç