mcc, clogit

Yoshiro Nagao

Join Date: Feb 2018

Posts: 24
#1

mcc, clogit

17 Feb 2018, 21:29

Hello

We conducted a matched case control study.

For each case, 2 control subjects are matched.
These 2 control subjects are from 2 different clusters.
---------------------------------------------------
Data:

group case control_clust1 control_clust2
1 exposed non-exp non-exp
2 exposed non-exp non-exp
3 non-exp non-exp non-exp
4 non-exp non-exp non-exp
5 non-exp non-exp non-exp
6 non-exp non-exp non-exp
7 non-exp non-exp non-exp
8 non-exp non-exp non-exp
9 non-exp non-exp non-exp
10 non-exp non-exp non-exp
11 non-exp non-exp non-exp
---------------------------------------------------

First, I compared between case and control_clust1,
to see the effect of exposure, as in:

. mcci 2 9 0 11

| Controls |
Cases | Exposed Unexposed | Total
-----------------+------------------------+------------
Exposed | 2 9 | 11
Unexposed | 0 11 | 11
-----------------+------------------------+------------
Total | 2 20 | 22

McNemar's chi2(1) = 9.00 Prob > chi2 = 0.0027
Exact McNemar significance probability = 0.0039

Proportion with factor
Cases .5
Controls .0909091 [95% Conf. Interval]
--------- --------------------
difference .4090909 .158186 .6599959
ratio 5.5 1.570118 19.26607
rel. diff. .45 .2319678 .6680322

odds ratio . 1.973826 . (exact)

Naturally, comparison between case and control_clust2
generates the same result.

Q1. Can I describe this result in the manuscript as
"OR = 1.97 (P<0.0027) based upon McNemar's chi square"?

This seems peculiar because there is no confidence
interval for the OR.
************************************************** ******

Next, I aggregated the two control clusters, and
used clogit, as in:

. clogit disease exposure,group(group)

Iteration 0: log likelihood = -12.084735
Iteration 1: log likelihood = -9.8875106 (not concave)
Iteration 2: log likelihood = -9.8875106

Conditional (fixed-effects) logistic regression Number of obs = 33
LR chi2(0) = 4.39
Prob > chi2 = .
Log likelihood = -9.8875106 Pseudo R2 = 0.1818

------------------------------------------------------------------------------
disease | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
exposure | 2.39e+20 . . . . .
------------------------------------------------------------------------------

Q2. This result seems more peculiar!, because mcc showed
a highly significant result (P<0.0027).
What was wrong?

Your assistance would be appreciated.

Yosh
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#2

18 Feb 2018, 10:09

You have no exposed controls. So exposure = 1 perfectly predicts case = 1. With perfect prediction, the maximum likelihood estimate of the exposure effect is infinite. -clogit- is trying to calculate that but because it is infinite, it fails. Your -clogit- results are simply invalid. Had you used -logistic- instead of -clogit- (which I am not recommending--it is not appropriate with grouped data), Stata would have checked for this possibility before proceeding with the estimation and would have told you about this, and omitted all unexposed observations from the analysis. This type of pre-check for perfect prediction is not implemented in -clogit-, so you just got a bunch of confusing non-results handed to you.

If you want to pursue a logistic regression model of the data, use -exlogistic- here. It will also tell you that maximum likelihood estimates are infinite, but it will compute a different estimator, the median unbiased estimate, that is defined. Since you have grouped data, don't forget to include the -group()- option.
1 like
Comment
Yoshiro Nagao

Join Date: Feb 2018

Posts: 24
#3

18 Feb 2018, 12:25

Hi Clyde,
Thank you very much !! for your swift reply.
Yes, exlogistic did more than what clogit did.
--------------------------------------------------------------------------------------------

. exlogistic disease exposure,group(group)

Enumerating sample-space combinations:
observation 1: enumerations = 2
observation 33: enumerations = 3
note: CMLE estimate for exposure is +inf; computing MUE

Exact logistic regression Number of obs = 33
Group variable: group Number of groups = 11

Obs per group: min = 3
avg = 3.0
max = 3

Model score = 4
Pr >= score = 0.1111
---------------------------------------------------------------------------
disease | Odds Ratio Suff. 2*Pr(Suff.) [95% Conf. Interval]
-------------+-------------------------------------------------------------
exposure | 4.828427* 2 0.2222 .3756182 +Inf
---------------------------------------------------------------------------
(*) median unbiased estimates (MUE)
--------------------------------------------------------------------------------------------

By the way, how about my Q1 in the initial posting?
Is the result from mcci reliable?
Can I state that "OR = 1.97 (P<0.0027) based upon McNemar's chi square"?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#4

18 Feb 2018, 13:46

No, it's not correct. You have no exposed controls, so that 11 in your input is a mistake. Also you have the numbers in the wrong order. The correct input would be -mcci 0 2 0 9-, and the output would show you that the odds ratio is undefined (as it should be with 0 exposed controls.) You cannot use -mcci- to get an odds ratio with this data.
1 like
Comment
Yoshiro Nagao

Join Date: Feb 2018

Posts: 24
#5

18 Feb 2018, 19:37

Thank you very much for your swift reply, again.

Now I understood the syntax of mcci .

Granted that, even if all of the cases are exposed while none of the control are exposed (i.e. mcci 0 11 0 0)
the odds ratios cannot be obtained, because there is no exposed control.

However, in such a case, an association between the disease and the exposure is intuitively obvious, isn't it?.

For instance, a virus was deteced from all the dead pigs, but from none of the pigs which happily survived an epidemy.

What is the appropriate stata command (or analysis design) for our data?

Last edited by Yoshiro Nagao; 18 Feb 2018, 19:54.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#6

18 Feb 2018, 20:59

However, in such a case, an association between the disease and the exposure is intuitively obvious, isn't it?.

No, not necessarily. Suppose the complete data were just two case control pairs, and in both the case was exposed and the control was not. Would it be obvious? Or could it just be luck of the draw?

What is the appropriate stata command (or analysis design) for our data?

I think that the -exlogistic- analysis is appropriate, and probably the best analysis in this situation.

Also, you can use -mcc- (or -mcci- if you prefer), but it won't give you an odds ratio in this situation. It will give you a risk difference, which will be finite and is suitable for this purpose. The drawback to relying on -mcc-/-mcci- is that it is only applicable to matched pairs, so you have to sacrifice one of your control groups. Your data are so scanty to begin with that I wouldn't advise that.

From the pedantry corner: Stata, not stata.
Comment

Joseph Coveney

Join Date: Apr 2014
Posts: 4420

18 Feb 2018, 23:58

If you're interested in the odds ratio and its confidence bounds, then Clyde's suggestion of -exlogistic- is about all that is currently available in Stata. It's a little conservative, though, and if you're interested in a test of association, then you might want to consider the user-written command -emh-, which is available from SSC.

Code:

version 15.1

clear *

input byte group str7 (case control_clust1 control_clust2)
1 exposed non-exp non-exp
2 exposed non-exp non-exp
3 non-exp non-exp non-exp
4 non-exp non-exp non-exp
5 non-exp non-exp non-exp
6 non-exp non-exp non-exp
7 non-exp non-exp non-exp
8 non-exp non-exp non-exp
9 non-exp non-exp non-exp
10 non-exp non-exp non-exp
11 non-exp non-exp non-exp
end

rename case inp1
rename control_clust1 inp01
rename control_clust2 inp02

quietly reshape long inp0, i(group) j(clu)
quietly replace inp1 = ".n" if clu == 2
quietly reshape long inp, i(group clu) j(cas)

label define Disease 0 Control 1 Case
label values cas Disease
label variable cas "Disease"

label define Exposures 0 "non-exp" 1 exposed .n ".n"
encode inp, generate(exr) label(Exposures) noextend
label variable exr "Exposure"

exlogistic cas exr,group(group) nolog

emh cas exr, general strata(group)

exit

Comment

Yoshiro Nagao

Join Date: Feb 2018

Posts: 24
#8

19 Feb 2018, 11:08

Hi Clyde, Thank you for your useful advice again. I will shift to exlogistic, from clogit.
In terms of mcc, I am not very clear about its syntax:
mcc Var_exposed_case Var_exposed_control
Var_exposed_case and \var_exposed_control are binary variables, which indicate whether an individual pig is an exposed case/exposed control, respectively? n is 33? In that case, how can I bundle the case and the matched control?
Comment
Yoshiro Nagao

Join Date: Feb 2018

Posts: 24
#9

19 Feb 2018, 11:13

Hi Joseph,
Thank you very much for letting me know emh command, and kindly writing down all the necessary command lines.
I will obtain emh and compare emh and exlogistic.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#10

19 Feb 2018, 12:02

Re #8:

First you have to encode the data as 0/1 for non-exposed and exposed, respectively. Then the variables are ready to use for -mcc-. So:

Code:

clear * input byte group str7 (case control_clust1 control_clust2) 1 exposed non-exp non-exp 2 exposed non-exp non-exp 3 non-exp non-exp non-exp 4 non-exp non-exp non-exp 5 non-exp non-exp non-exp 6 non-exp non-exp non-exp 7 non-exp non-exp non-exp 8 non-exp non-exp non-exp 9 non-exp non-exp non-exp 10 non-exp non-exp non-exp 11 non-exp non-exp non-exp end label define exposure 0 "non-exp" 1 "exposed" // CREATE NUMERIC 0/1 ENCODING OF THE DATA foreach v of varlist case control_* { encode `v', gen(_`v') label(exposure) drop `v' rename _`v' `v' } mcc case control_clust1 mcc case control_clust2

Notes:
1. Since you have no exposed controls in either cluster, the results for cluster1 and cluster2 come out identical.

2. The variables case_exposed and control_exposed in the -mcc- syntax assume that your data are organized so that each observation is a matched pair. The variable case_exposed is coded 1 if the case was exposed and 0 if the case was not exposed. Similarly control_exposed is coded 1 if the control was exposed and 0 if the control was not exposed.

3. -mcc- is not set up to handle multiple control groups in a single analysis. So each control group must be treated in a separate command.

As noted earlier, use of -mcc-, in light of 3. above, discards a lot of your data, which you can ill afford. So I don't really recommend this approach.
Comment
Yoshiro Nagao

Join Date: Feb 2018

Posts: 24
#11

21 Feb 2018, 07:22

Hi Clyde,
Thank you very much for your instruction for mcc.

. mcc case control

| Controls |
Cases | Exposed Unexposed | Total
-----------------+------------------------+------------
Exposed | 0 2 | 2
Unexposed | 0 9 | 9
-----------------+------------------------+------------
Total | 0 11 | 11

McNemar's chi2(1) = 2.00 Prob > chi2 = 0.1573
Exact McNemar significance probability = 0.5000

Proportion with factor
Cases .1818182
Controls 0 [95% Conf. Interval]
--------- --------------------
difference .1818182 -.1370177 .500654
ratio . . .
rel. diff. .1818182 -.0461086 .4097449

odds ratio . .1878091 . (exact)

Obviously, as you said, the result is not very impressive.

I follow you advice, to use exlogistic.
Comment
Yoshiro Nagao

Join Date: Feb 2018

Posts: 24
#12

14 Mar 2018, 04:20

Hello.
Exlogistic seems more attractive than clogit, especially when the sample size is small.

Is it possible to estimate the sample size which would be necessary to generate
a statistical significance (i.e. alpha 0.05, power 0.8), based upon a result
from exlogisttic applied upon a small dataset? How?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#13

14 Mar 2018, 12:10

I'm not aware of any routine in Stata that would do this. You might google this to see if anything turns up. I would imagine that the sample size wouldn't much different, if at all, from what you would come up with for a sample size analysis just based on logistic regression. In the end, you might have to do this by simulation.
Comment
Yoshiro Nagao

Join Date: Feb 2018

Posts: 24
#14

15 Mar 2018, 00:49

Clyde, thank you very much for your swift reply. I will check the sample size analysis for clogit and try simulation.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4420
#15

15 Mar 2018, 05:50

If you're interested in just generating a statistical significance, then I think that -emh- will be more powerful than -exlogistic-. I don't know how small the sample sizes can be before the test size rises substantially above nominal, but the citation given in its help file uses some pretty small illustrative datasets.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment