paired 2 by 2 table using long format dataset

Zihan Dong

Join Date: Feb 2021
Posts: 44

paired 2 by 2 table using long format dataset

14 Jul 2021, 09:32

Hello everyone,

The following is part of my dataset. This is a case-control dataset, 1 case is matched with 2 controls (i.e. 3 people in a match set). Now, I want to make it a 1:1 case-control and generate a "paired 2 by 2 table".

Does anyone have any insight about this?

Thank you!

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte matchset float ntitre0 byte(controlno case)
 3 11.71193 1 0
 3 11.60469 2 0
 3 9.771929 . 1
 6 10.52193 1 0
 6 10.78193 2 0
 6 9.001929 . 1
 9 12.57193 1 0
 9 11.26193 . 1
 9 11.40193 2 0
13 11.80193 1 0
13 11.42193 . 1
13 10.72193 2 0
14 10.64193 1 0
14 10.03193 . 1
14 10.93193 2 0
16 11.75193 1 0
16 11.48193 2 0
16 12.40193 . 1
17 11.14193 1 0
17 11.54193 2 0
17 11.52193 . 1
19 9.438595 . 1
19 9.471928 1 0
19 9.811929 2 0
22 10.04693 . 1
22 10.48193 1 0
22 9.461927 2 0
24 9.001929 1 0
24 10.35193 . 1
24 10.95526 2 0
25 9.221928 1 0
25 10.02193 2 0
25 10.28193 . 1
end

Tags: None

Ken Chui

Join Date: Aug 2014

Posts: 1058
#2

14 Jul 2021, 10:10

What do you mean by a "paired 2x2 table?" The data here only shows case vs. control, there isn't any other categorical variable to generate anything in a "2x2" manner. An example may be helpful.
1 like
Comment
Zihan Dong

Join Date: Feb 2021

Posts: 44
#3

15 Jul 2021, 02:27

Originally posted by Ken Chui View Post

What do you mean by a "paired 2x2 table?" The data here only shows case vs. control, there isn't any other categorical variable to generate anything in a "2x2" manner. An example may be helpful.

Hi Ken!

Thanks for your reply.

Yes, you're right. I put the wrong exposure variable. The following should be the correct dataset:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte(matchset controlno case) float ntitre0hi 3 1 0 1 3 2 0 1 3 . 1 0 6 1 0 0 6 2 0 0 6 . 1 0 9 1 0 1 9 . 1 1 9 2 0 1 13 1 0 1 13 . 1 1 13 2 0 0 14 1 0 0 14 . 1 0 14 2 0 1 16 1 0 1 16 2 0 1 16 . 1 1 17 1 0 1 17 2 0 1 17 . 1 1 19 . 1 0 19 1 0 0 19 2 0 0 22 . 1 0 22 1 0 0 22 2 0 0 24 1 0 0 24 . 1 0 24 2 0 1 25 1 0 0 25 2 0 0 25 . 1 0 end
Comment
Ken Chui

Join Date: Aug 2014

Posts: 1058
#4

15 Jul 2021, 07:20

Originally posted by Zihan Dong View Post

Yes, you're right. I put the wrong exposure variable. The following should be the correct dataset:

Thank you, the data are helpful. Then, given this data set, what do you mean by "paired 2x2 table?" If you use this data set as example, could you draft the end product to illustrate what you mean by that?
Comment
Zihan Dong

Join Date: Feb 2021

Posts: 44
#5

15 Jul 2021, 08:07

Originally posted by Ken Chui View Post

Thank you, the data are helpful. Then, given this data set, what do you mean by "paired 2x2 table?" If you use this data set as example, could you draft the end product to illustrate what you mean by that?

Sure. what I expect is as follows:

This was generated using command "mcc" in another dataset, which is a wide dataset. But now I have long dataset. So I don't know what to do...
Comment

Mike Lacy

Join Date: Apr 2014
Posts: 2416

15 Jul 2021, 08:51

I don't know is intended by "make it a 1:1 case-control ... " One could, of course, drop one control at random within each matched set, but I'm thinking that this is not what is desired.

What I am going to suggest instead is a way to do an appropriate analysis here while retaining all the data, i.e., obtain an odds ratio for being a case in relation to the ntitre0hi exposure, *while controlling for matched set.* (Note that in case control studies unlike cohort studies, a valid analysis *must* control for the matching variable. A table such as Zihan Dong shows does not take account of the matching.)

The -mhodds- command, as shown below, will give an analysis stratified by the matched set variable, while using the given data layout.
(Note that for the example data, many of the matched sets turn out to be non-informative.)

Code:

mhodds case ntitre0hi, by(matchset)
. mhodds case ntitre0hi, by(matchset)

Maximum likelihood estimate of the odds ratio
Comparing ntitre0hi==1 vs. ntitre0hi==0
by matchset

note: only 4 of the 11 strata formed in this analysis contribute
      information about the effect of the explanatory variable

-------------------------------------------------------------------------------
 matchset | Odds Ratio        chi2(1)         P>chi2       [95% Conf. Interval]
----------+--------------------------------------------------------------------
        3 |   0.000000           2.00         0.1573               .          .
        6 |          .              .             .               .          .
        9 |          .              .             .               .          .
       13 |          .           0.50         0.4795               .          .
       14 |   0.000000           0.50         0.4795               .          .
       16 |          .              .             .               .          .
       17 |          .              .             .               .          .
       19 |          .              .             .               .          .
       22 |          .              .             .               .          .
       24 |   0.000000           0.50         0.4795               .          .
       25 |          .              .             .               .          .
-------------------------------------------------------------------------------

    Mantel-Haenszel estimate controlling for matchset
    ----------------------------------------------------------------
     Odds Ratio    chi2(1)        P>chi2        [95% Conf. Interval]
    ----------------------------------------------------------------
       0.250000       1.13        0.2888         0.015637   3.996877
    ----------------------------------------------------------------

Test of homogeneity of ORs (approx): chi2(3)   =    2.75
                                     Pr>chi2   =  0.4318

Last edited by Mike Lacy; 15 Jul 2021, 09:01.

Comment

Zihan Dong

Join Date: Feb 2021
Posts: 44

19 Jul 2021, 07:46

Originally posted by Mike Lacy View Post

Code:

mhodds case ntitre0hi, by(matchset)
. mhodds case ntitre0hi, by(matchset)

Maximum likelihood estimate of the odds ratio
Comparing ntitre0hi==1 vs. ntitre0hi==0
by matchset

note: only 4 of the 11 strata formed in this analysis contribute
information about the effect of the explanatory variable

-------------------------------------------------------------------------------
matchset | Odds Ratio chi2(1) P>chi2 [95% Conf. Interval]
----------+--------------------------------------------------------------------
3 | 0.000000 2.00 0.1573 . .
6 | . . . . .
9 | . . . . .
13 | . 0.50 0.4795 . .
14 | 0.000000 0.50 0.4795 . .
16 | . . . . .
17 | . . . . .
19 | . . . . .
22 | . . . . .
24 | 0.000000 0.50 0.4795 . .
25 | . . . . .
-------------------------------------------------------------------------------

Mantel-Haenszel estimate controlling for matchset
----------------------------------------------------------------
Odds Ratio chi2(1) P>chi2 [95% Conf. Interval]
----------------------------------------------------------------
0.250000 1.13 0.2888 0.015637 3.996877
----------------------------------------------------------------

Test of homogeneity of ORs (approx): chi2(3) = 2.75
Pr>chi2 = 0.4318

thank you Mike. But I noticed when I run "clogit" command, the results were different, especially when the number of covariates were greater than 2 (but it should be the same as it in "mhodds"?). Do you have any idea about that?

Comment

Mike Lacy

Join Date: Apr 2014

Posts: 2416
#8

19 Jul 2021, 08:42

Some difference between the parameter estimates from -clogit- and -mhodds- (also -cc-) would not surprise me, as different methods of estimation are used, and my presumption would be that the ML methods in -clogit- would have higher bias than other methods. If there is a large difference (say more than 10%) or so in estimates of the odds ratio, I'd be surprised and I'd like to hear what others say. As it happens, Bruce Weaver and I had an offline exchange about these issues, and remained somewhat uncertain about which approach would be preferred, so I'd be interested in other comments regardless. Note also that the different commands (-clogit-, -mhodds-, -cc-) can and in general do use different methods of calculating confidence intervals.
1 like
Comment

Announcement

paired 2 by 2 table using long format dataset

Comment

Comment

Comment

Comment

Comment

Comment

Comment