Help with a detailed specification using multiple imputation

Benji Smith

Join Date: Jan 2019

Posts: 1
#1

Help with a detailed specification using multiple imputation

28 Jan 2019, 16:10

Hello,

I am attempting to estimate a model:

Code:

logit y x1 x2 x3

where x1 is a dummy variable, x2 and x3 are continuous variables with non-normal distributions, and x1-3 have missing values. To fill in the missing values, I am using multiple imputation. One possible specification would be:

Code:

mi set flong mi register imputed x1 x2 x3 mi impute chained (logit) x1 (pmm, knn(10)) x2 x3 = y, add(5) burnin(10)

However, testing has shown that x2 and x3 have a non-linear relationship; therefore I would like for the mi procedure to perform a customized regression during the first step of predictive mean matching. For example when imputing during iteration 2, I would like for the model for x2 to be specified such that x3 is binned into the categorical variable b3, which has ten bins for different ranges of values of x3, and the variable b3 to be included in the model as a factor variable. So, during the initial regression step when imputing x2, the code would theoretically be:

Code:

regress x2 x1 i.b3 y

instead of

Code:

regress x2 x1 x3 y

As a generalization, I would like for all continuous variables in the model to be continuous when they are the dependent variable, but to be binned into a specific number of automatically-determined bins when they are an independent variable.

Any help or guidance is deeply appreciated! I've played around with mata a few times before and think this may require defining a usermethod, but I'm hoping another method has been developed to handle this type of request. If not, any guidance as to where to find the right ado files on my system to "borrow" code from would be appreciated (I'm running Stata 15.1 on a Windows server).
Tags: multiple imputation, predictive mean matching
daniel klein

Join Date: Mar 2014

Posts: 3847
#2

29 Jan 2019, 01:51

In general, I would look into mi's include() and omit() options. However, since x2 and x3 have missing values, so will b2 and b3. Thus, you would need to impute those missing values, too; perhaps using an ordered logistic model. If you do not impute the missing values in b2 and b3, you are likely to end up with missing imputed values.

I have no clear idea how a user-method would help here. As far as I understand it, a user-method defines a general model to impute missing values; it does not allow you to write models for specific predictors in your equation.

Best
Daniel

Last edited by daniel klein; 29 Jan 2019, 01:54.
Comment

Announcement

Help with a detailed specification using multiple imputation

Comment