Help with Matching and Clogit

Theodore Habarth

Join Date: Feb 2021

Posts: 11
#1

Help with Matching and Clogit

05 Feb 2021, 16:05

Hello,

I am currently working on a research project in STATA 16 in which I am comparing a sample of trauma patients from an institutional database to a sample of a national database. The institution is unique in that our times to procedures of a certain type are shorter than the times to the same procedure from the national database. I would like to see if there is a statistical difference in a few outcomes between the two samples as a result of the difference in "time to incision". Some of the outcomes are dichotomous and some continuous [for example: mortality (yes/no), hospital length of stay (# of days)]. I am not sure exactly which technique I should be using to go about this. The issue is that the institutional sample is much smaller (n=200) than the national sample (n=9,000). My thought was to use matching in order to find matches from the national database and control for some covariates, specifically injury indicators as the severity of the injury on arrival (which is highly variable from patient to patient) could theoretically have a big impact on the outcomes of interest. Essentially I want to find a relationship between time to incision and mortality. Should I be using a matching technique or maybe conditional logistical regression?

Apologies upfront for my ignorance!

Thanks for any input,
Theo
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#2

05 Feb 2021, 17:52

Well, conditional logistic regression is one way of analyzing matched data. It can only be used with dichotomous outcome variables. For your continuous variables you would probably use -xtreg, fe-. And if you have any count outcomes, there is also -xtpoisson, fe- or -xtnbreg, fe-. So the most popular analytic models for handling matched-tuple data are there.

Matching has its pros and cons. On the plus side, if you can do exact or nearly exact matching on some covariate(s) that are strongly associated with the outcome variable, then you can completely eliminate the confounding effect of those covariates. The downside corresponding to that is that you cannot estimate the effects of those covariates if that were of interest. Another issue with matching is that finding exact or close matches can be difficult. If the distributions of these covariates in the two data sets are very similar, then it isn't hard to find matches--but in that case, the matching doesn't really accomplish much anyway as there is little or no confounding to begin with. If the distributions are very different, you may end up with a substantial number of cases for which you cannot find any suitable match--and those cases then disappear from the analysis, which introduces a different kind of bias, and also robs you of power.

Now, in your situation, you are fortunate that the national sample is much larger than the institutional sample. So even for somewhat unusual cases in your institutional sample you might well be able to find some matches in that large national sample, even if the overall distributions of the covariates in question are rather different. It certainly sounds like matching on severity of injury would be a good idea, especially if you are able to find very close matches on that for nearly everybody. I would be flexible, and I recommend grabbing as many matches as you can for each institutional case--there is no law requiring you to have the same number of matches for each case (though from reading the literature you might think there is.) And with luck you might be able to match on severity plus one or two other covariates. Again, the covariates chosen should be strongly associated with the outcome, and with a distribution that lets you obtain good matches for all, or nearly all, institutional cases.

The key thing from a programming perspective is this: ultimately you need the data in long layout, and it must have a variable that designates which matched tuple each observation belongs to. (These can be arbitrary numbers, or it is often convenient to use the unique identifier of the institutional case.) Then you -xtset- that tuple identifier as the panel variable. Then you can run whichever -xt- regression is most suitable for your outcome.

In truth, you can't really know ahead of time whether you will be able to do successful matching until you try it. If it doesn't work out, a more flexible way of matching observations is with the propensity score method. I'm not a huge fan of propensity score matching because it relies on some assumptions that are unverifiable and sometimes not very plausible. But in situations where the assumptions are plausible, it does eliminate confounding by the variables involved, and it is often able to find matches for the hard cases that directly matching on the covariates themselves misses. Stata's -teffects- suites include propensity matching, so you might look at those if you want to go this route.
Comment

Announcement

Help with Matching and Clogit

Comment