I am trying to understand whether an intervention ("tha") is associated with increased 12-month mortality ("rip") using StataSE 13.0. My first step was to match the groups across a number of variables ("age", "sex", "preopasa", "premob", and "origin") using the "cem" package for coarsened exact matching. My understanding from the literature around CEM is that researchers should then continue with their analyses (e.g. multivariable regression) as normal but using the matched groups.
When I run the matching code:
ssc install cem
cem age sex preopasa premob origin, treatment(tha) autocuts(fd)
It appears to work and almost all patients (29,181/29,267) are allocated to 56 matched strata. Stata creates a number of new variables: cem_strata, cem_matched, and cem_weights, which seems to be what is expected.
To my non-statistician mind, I imagined that that those co-variables would then be become less significant in any subsequent regression models. However, this doesn’t appear to be the case.
When I run code that I believe should run a logistic regression model using the matched weights:
logistic rip age sex preopasa premob origin tha [iweight=cem_weights]

I get an output that is barely any different – and in some cases shows bigger odds ratios / wider confidence intervals – than when I run code without the CEM weights:
logistic rip age sex preopasa premob origin tha

I realise that Statalist might be the wrong place to ask this issue but wanted to check that I am using cem correctly before wondering whether I have just misunderstood how CEM works as a technique. I would also be interested if there are any Stata tricks that can be used to see whether or not the matching was successful. If anyone has insights or experience of cem then please let me know.
When I run the matching code:
ssc install cem
cem age sex preopasa premob origin, treatment(tha) autocuts(fd)
It appears to work and almost all patients (29,181/29,267) are allocated to 56 matched strata. Stata creates a number of new variables: cem_strata, cem_matched, and cem_weights, which seems to be what is expected.
To my non-statistician mind, I imagined that that those co-variables would then be become less significant in any subsequent regression models. However, this doesn’t appear to be the case.
When I run code that I believe should run a logistic regression model using the matched weights:
logistic rip age sex preopasa premob origin tha [iweight=cem_weights]
I get an output that is barely any different – and in some cases shows bigger odds ratios / wider confidence intervals – than when I run code without the CEM weights:
logistic rip age sex preopasa premob origin tha
I realise that Statalist might be the wrong place to ask this issue but wanted to check that I am using cem correctly before wondering whether I have just misunderstood how CEM works as a technique. I would also be interested if there are any Stata tricks that can be used to see whether or not the matching was successful. If anyone has insights or experience of cem then please let me know.
Comment