Multiple imputation of logistic regession

Bill Smith

Join Date: Sep 2014

Posts: 158
#1

Multiple imputation of logistic regession

17 Mar 2015, 12:09

This is more of a statistics question for an analysis I plan to carry out in Stata. I recently read a couple of articles that seemingly contradict analysis of multiply imputed data. Table VIII in this article:http://bacbuc.hd.free.fr/WebDAV/data...ite-SM2010.pdf suggests that a transformation is necessary to combine SDs (I assume this also applies to SEs), although I don't see any specific suggestion. Their contention is that anything dependent on the sample size is cannot be combined using Rubin's rules. However, in another paper: http://www.biomedcentral.com/content...-2288-9-57.pdf at least one of the same authors makes the claim that SDs can be combined using Rubin's rules.

For the moment ignore the problem of model selection. If I perform logistic regression and have coefficients and SEs, can I use Stata to obtain proper overall estimates of the coefficients and SEs? Would I need to use some transformation of the SEs first? If so, what transformation?

What if instead of slope coefficients I have probabilities. For survival probabilities a complementary log-log transformation is suggested (http://www.biomedcentral.com/content...-2288-9-57.pdf). Is a similar transformation needed for combination of logistic probabilities?
Tags: None
Maarten Buis

Join Date: Mar 2014

Posts: 3459
#2

18 Mar 2015, 05:21

Short answer: Stata does this correctly and you don't need to do anything special.

Slightly longer answer: The Rubin rule requires that the sampling distribution of the parameter follows a normal distribution. An odds ratio cannot be negative, so the sampling distribution of that parameter cannot be normal. This is what Table VIII meant when it said that Odds ratios may need a transformation before using the Rubin rules. A reasonable transformation would be to look at the log(odds ratio), which can take any value. In fact, when Stata estimates a logistic regression, it estimates those log(odds ratios) and only when it comes to displaying the results does it transform them back to odds ratios for your convenience. What is saved by logit, and used by mi, are still the log(odds ratios).

The same is true for the hazard ratios in a a discrete time survival model using cloglog (in the model you proposed the outcome is not a probability but a hazard rate): The coeficients left behind by that model, and used by mi, are log(hazard ratios), even though you can ask Stata to display the hazard ratios. So, with those models you also don't have to do anything special as Stata already applies a reasonable transformation for you.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Bill Smith

Join Date: Sep 2014

Posts: 158
#3

18 Mar 2015, 08:55

Thanks for your input. Understand the normality assumption. And I know how to transform ORs for proper pooling, although this is automatic in Stata. I just think it's odd that the two articles I cited seemingly give contradictory advice. One says SDs don't have to be transformed prior to pooling and the other says they must, but does not state how. I'm still wondering if transformation of logistic probabilities is necessary. I would think so, but the article deals only with survival probabilities. In any event, I believe one could use mi predict or mi predictnl after logistic regression to obtain the desired quantities. Please correct me if I'm wrong. And if anyone can shed any light of the contradiction in the articles, I'd be interested in hearing.
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3459
#4

19 Mar 2015, 03:38

To quote your first paper:

"In logistic regression, we may want to estimate the probabilities \(\pi_i=expit(\eta_i)\), where expit(.) denotes the inverse of the logit function. We could apply Rubin’s rules on the probability scale, giving \(\hat{\pi}_i = 1/m \sum_{j=1}^m expit(\hat{\eta}_i)\) or we could use \(\hat{\pi}_i = expit(\hat{\eta}_i)\): these are usually very similar."

So the first paper does suggest that you can transform the predicted probabilities prior to applying the Rubin rule, but that it often does not change much. Your second article is a bit more strict (or less pragmatic, depending on how you wish to see things) and suggests that you should transform the predicted probabilities. Both are correct: The sampling distribution of the predicted probabilities will more often become close to normal after transformation. But, in many applications that the authors were involved in this transformation and backtransformation does not matter much. I don't see a contradiction between these two statements: one is theoretical and the other is empirical. The theoretical statement does not tell you how wrong you are when you leave the transformation out, the emprical statement tells you, not very wrong.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment

Announcement

Multiple imputation of logistic regession

Comment

Comment

Comment