Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • mcc, clogit

    Hello

    We conducted a matched case control study.

    For each case, 2 control subjects are matched.
    These 2 control subjects are from 2 different clusters.
    ---------------------------------------------------
    Data:

    group case control_clust1 control_clust2
    1 exposed non-exp non-exp
    2 exposed non-exp non-exp
    3 non-exp non-exp non-exp
    4 non-exp non-exp non-exp
    5 non-exp non-exp non-exp
    6 non-exp non-exp non-exp
    7 non-exp non-exp non-exp
    8 non-exp non-exp non-exp
    9 non-exp non-exp non-exp
    10 non-exp non-exp non-exp
    11 non-exp non-exp non-exp
    ---------------------------------------------------

    First, I compared between case and control_clust1,
    to see the effect of exposure, as in:

    . mcci 2 9 0 11

    | Controls |
    Cases | Exposed Unexposed | Total
    -----------------+------------------------+------------
    Exposed | 2 9 | 11
    Unexposed | 0 11 | 11
    -----------------+------------------------+------------
    Total | 2 20 | 22

    McNemar's chi2(1) = 9.00 Prob > chi2 = 0.0027
    Exact McNemar significance probability = 0.0039

    Proportion with factor
    Cases .5
    Controls .0909091 [95% Conf. Interval]
    --------- --------------------
    difference .4090909 .158186 .6599959
    ratio 5.5 1.570118 19.26607
    rel. diff. .45 .2319678 .6680322

    odds ratio . 1.973826 . (exact)

    Naturally, comparison between case and control_clust2
    generates the same result.

    Q1. Can I describe this result in the manuscript as
    "OR = 1.97 (P<0.0027) based upon McNemar's chi square"?

    This seems peculiar because there is no confidence
    interval for the OR.
    ************************************************** ******

    Next, I aggregated the two control clusters, and
    used clogit, as in:

    . clogit disease exposure,group(group)

    Iteration 0: log likelihood = -12.084735
    Iteration 1: log likelihood = -9.8875106 (not concave)
    Iteration 2: log likelihood = -9.8875106

    Conditional (fixed-effects) logistic regression Number of obs = 33
    LR chi2(0) = 4.39
    Prob > chi2 = .
    Log likelihood = -9.8875106 Pseudo R2 = 0.1818

    ------------------------------------------------------------------------------
    disease | Coef. Std. Err. z P>|z| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    exposure | 2.39e+20 . . . . .
    ------------------------------------------------------------------------------

    Q2. This result seems more peculiar!, because mcc showed
    a highly significant result (P<0.0027).
    What was wrong?

    Your assistance would be appreciated.

    Yosh


  • #2
    You have no exposed controls. So exposure = 1 perfectly predicts case = 1. With perfect prediction, the maximum likelihood estimate of the exposure effect is infinite. -clogit- is trying to calculate that but because it is infinite, it fails. Your -clogit- results are simply invalid. Had you used -logistic- instead of -clogit- (which I am not recommending--it is not appropriate with grouped data), Stata would have checked for this possibility before proceeding with the estimation and would have told you about this, and omitted all unexposed observations from the analysis. This type of pre-check for perfect prediction is not implemented in -clogit-, so you just got a bunch of confusing non-results handed to you.

    If you want to pursue a logistic regression model of the data, use -exlogistic- here. It will also tell you that maximum likelihood estimates are infinite, but it will compute a different estimator, the median unbiased estimate, that is defined. Since you have grouped data, don't forget to include the -group()- option.

    Comment


    • #3
      Hi Clyde,
      Thank you very much !! for your swift reply.
      Yes, exlogistic did more than what clogit did.
      --------------------------------------------------------------------------------------------

      . exlogistic disease exposure,group(group)

      Enumerating sample-space combinations:
      observation 1: enumerations = 2
      observation 33: enumerations = 3
      note: CMLE estimate for exposure is +inf; computing MUE

      Exact logistic regression Number of obs = 33
      Group variable: group Number of groups = 11

      Obs per group: min = 3
      avg = 3.0
      max = 3

      Model score = 4
      Pr >= score = 0.1111
      ---------------------------------------------------------------------------
      disease | Odds Ratio Suff. 2*Pr(Suff.) [95% Conf. Interval]
      -------------+-------------------------------------------------------------
      exposure | 4.828427* 2 0.2222 .3756182 +Inf
      ---------------------------------------------------------------------------
      (*) median unbiased estimates (MUE)
      --------------------------------------------------------------------------------------------

      By the way, how about my Q1 in the initial posting?
      Is the result from mcci reliable?
      Can I state that "OR = 1.97 (P<0.0027) based upon McNemar's chi square"?

      Comment


      • #4
        No, it's not correct. You have no exposed controls, so that 11 in your input is a mistake. Also you have the numbers in the wrong order. The correct input would be -mcci 0 2 0 9-, and the output would show you that the odds ratio is undefined (as it should be with 0 exposed controls.) You cannot use -mcci- to get an odds ratio with this data.

        Comment


        • #5
          Thank you very much for your swift reply, again.

          Now I understood the syntax of mcci .

          Granted that, even if all of the cases are exposed while none of the control are exposed (i.e. mcci 0 11 0 0)
          the odds ratios cannot be obtained, because there is no exposed control.

          However, in such a case, an association between the disease and the exposure is intuitively obvious, isn't it?.

          For instance, a virus was deteced from all the dead pigs, but from none of the pigs which happily survived an epidemy.

          What is the appropriate stata command (or analysis design) for our data?
          Last edited by Yoshiro Nagao; 18 Feb 2018, 19:54.

          Comment


          • #6
            However, in such a case, an association between the disease and the exposure is intuitively obvious, isn't it?.
            No, not necessarily. Suppose the complete data were just two case control pairs, and in both the case was exposed and the control was not. Would it be obvious? Or could it just be luck of the draw?

            What is the appropriate stata command (or analysis design) for our data?
            I think that the -exlogistic- analysis is appropriate, and probably the best analysis in this situation.

            Also, you can use -mcc- (or -mcci- if you prefer), but it won't give you an odds ratio in this situation. It will give you a risk difference, which will be finite and is suitable for this purpose. The drawback to relying on -mcc-/-mcci- is that it is only applicable to matched pairs, so you have to sacrifice one of your control groups. Your data are so scanty to begin with that I wouldn't advise that.

            From the pedantry corner: Stata, not stata.

            Comment


            • #7
              If you're interested in the odds ratio and its confidence bounds, then Clyde's suggestion of -exlogistic- is about all that is currently available in Stata. It's a little conservative, though, and if you're interested in a test of association, then you might want to consider the user-written command -emh-, which is available from SSC.
              Code:
              version 15.1
              
              clear *
              
              input byte group str7 (case control_clust1 control_clust2)
              1 exposed non-exp non-exp
              2 exposed non-exp non-exp
              3 non-exp non-exp non-exp
              4 non-exp non-exp non-exp
              5 non-exp non-exp non-exp
              6 non-exp non-exp non-exp
              7 non-exp non-exp non-exp
              8 non-exp non-exp non-exp
              9 non-exp non-exp non-exp
              10 non-exp non-exp non-exp
              11 non-exp non-exp non-exp
              end
              
              rename case inp1
              rename control_clust1 inp01
              rename control_clust2 inp02
              
              quietly reshape long inp0, i(group) j(clu)
              quietly replace inp1 = ".n" if clu == 2
              quietly reshape long inp, i(group clu) j(cas)
              
              label define Disease 0 Control 1 Case
              label values cas Disease
              label variable cas "Disease"
              
              label define Exposures 0 "non-exp" 1 exposed .n ".n"
              encode inp, generate(exr) label(Exposures) noextend
              label variable exr "Exposure"
              
              exlogistic cas exr,group(group) nolog
              
              emh cas exr, general strata(group)
              
              exit

              Comment


              • #8
                Hi Clyde, Thank you for your useful advice again. I will shift to exlogistic, from clogit.
                In terms of mcc, I am not very clear about its syntax:
                mcc Var_exposed_case Var_exposed_control
                Var_exposed_case and \var_exposed_control are binary variables, which indicate whether an individual pig is an exposed case/exposed control, respectively? n is 33? In that case, how can I bundle the case and the matched control?

                Comment


                • #9
                  Hi Joseph,
                  Thank you very much for letting me know emh command, and kindly writing down all the necessary command lines.
                  I will obtain emh and compare emh and exlogistic.

                  Comment


                  • #10
                    Re #8:

                    First you have to encode the data as 0/1 for non-exposed and exposed, respectively. Then the variables are ready to use for -mcc-. So:

                    Code:
                    clear *
                    
                    input byte group str7 (case control_clust1 control_clust2)
                    1 exposed non-exp non-exp
                    2 exposed non-exp non-exp
                    3 non-exp non-exp non-exp
                    4 non-exp non-exp non-exp
                    5 non-exp non-exp non-exp
                    6 non-exp non-exp non-exp
                    7 non-exp non-exp non-exp
                    8 non-exp non-exp non-exp
                    9 non-exp non-exp non-exp
                    10 non-exp non-exp non-exp
                    11 non-exp non-exp non-exp
                    end
                    label define exposure    0    "non-exp"    1    "exposed"
                    
                    //    CREATE NUMERIC 0/1 ENCODING OF THE DATA
                    foreach v of varlist case control_* {
                        encode `v', gen(_`v') label(exposure)
                        drop `v'
                        rename _`v' `v'
                    }
                    
                    mcc case control_clust1
                    mcc case control_clust2
                    Notes:
                    1. Since you have no exposed controls in either cluster, the results for cluster1 and cluster2 come out identical.

                    2. The variables case_exposed and control_exposed in the -mcc- syntax assume that your data are organized so that each observation is a matched pair. The variable case_exposed is coded 1 if the case was exposed and 0 if the case was not exposed. Similarly control_exposed is coded 1 if the control was exposed and 0 if the control was not exposed.

                    3. -mcc- is not set up to handle multiple control groups in a single analysis. So each control group must be treated in a separate command.

                    As noted earlier, use of -mcc-, in light of 3. above, discards a lot of your data, which you can ill afford. So I don't really recommend this approach.

                    Comment


                    • #11
                      Hi Clyde,
                      Thank you very much for your instruction for mcc.

                      . mcc case control

                      | Controls |
                      Cases | Exposed Unexposed | Total
                      -----------------+------------------------+------------
                      Exposed | 0 2 | 2
                      Unexposed | 0 9 | 9
                      -----------------+------------------------+------------
                      Total | 0 11 | 11

                      McNemar's chi2(1) = 2.00 Prob > chi2 = 0.1573
                      Exact McNemar significance probability = 0.5000

                      Proportion with factor
                      Cases .1818182
                      Controls 0 [95% Conf. Interval]
                      --------- --------------------
                      difference .1818182 -.1370177 .500654
                      ratio . . .
                      rel. diff. .1818182 -.0461086 .4097449

                      odds ratio . .1878091 . (exact)

                      Obviously, as you said, the result is not very impressive.

                      I follow you advice, to use exlogistic.

                      Comment


                      • #12
                        Hello.
                        Exlogistic seems more attractive than clogit, especially when the sample size is small.

                        Is it possible to estimate the sample size which would be necessary to generate
                        a statistical significance (i.e. alpha 0.05, power 0.8), based upon a result
                        from exlogisttic applied upon a small dataset? How?

                        Comment


                        • #13
                          I'm not aware of any routine in Stata that would do this. You might google this to see if anything turns up. I would imagine that the sample size wouldn't much different, if at all, from what you would come up with for a sample size analysis just based on logistic regression. In the end, you might have to do this by simulation.

                          Comment


                          • #14
                            Clyde, thank you very much for your swift reply. I will check the sample size analysis for clogit and try simulation.

                            Comment


                            • #15
                              If you're interested in just generating a statistical significance, then I think that -emh- will be more powerful than -exlogistic-. I don't know how small the sample sizes can be before the test size rises substantially above nominal, but the citation given in its help file uses some pretty small illustrative datasets.

                              Comment

                              Working...
                              X