Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with a detailed specification using multiple imputation

    Hello,

    I am attempting to estimate a model:

    Code:
    logit y x1 x2 x3
    where x1 is a dummy variable, x2 and x3 are continuous variables with non-normal distributions, and x1-3 have missing values. To fill in the missing values, I am using multiple imputation. One possible specification would be:

    Code:
    mi set flong
    mi register imputed x1 x2 x3
    mi impute chained (logit) x1 (pmm, knn(10)) x2 x3 = y, add(5) burnin(10)
    However, testing has shown that x2 and x3 have a non-linear relationship; therefore I would like for the mi procedure to perform a customized regression during the first step of predictive mean matching. For example when imputing during iteration 2, I would like for the model for x2 to be specified such that x3 is binned into the categorical variable b3, which has ten bins for different ranges of values of x3, and the variable b3 to be included in the model as a factor variable. So, during the initial regression step when imputing x2, the code would theoretically be:

    Code:
    regress x2 x1 i.b3 y
    instead of
    Code:
    regress x2 x1 x3 y
    As a generalization, I would like for all continuous variables in the model to be continuous when they are the dependent variable, but to be binned into a specific number of automatically-determined bins when they are an independent variable.

    Any help or guidance is deeply appreciated! I've played around with mata a few times before and think this may require defining a usermethod, but I'm hoping another method has been developed to handle this type of request. If not, any guidance as to where to find the right ado files on my system to "borrow" code from would be appreciated (I'm running Stata 15.1 on a Windows server).

  • #2
    In general, I would look into mi's include() and omit() options. However, since x2 and x3 have missing values, so will b2 and b3. Thus, you would need to impute those missing values, too; perhaps using an ordered logistic model. If you do not impute the missing values in b2 and b3, you are likely to end up with missing imputed values.

    I have no clear idea how a user-method would help here. As far as I understand it, a user-method defines a general model to impute missing values; it does not allow you to write models for specific predictors in your equation.

    Best
    Daniel
    Last edited by daniel klein; 29 Jan 2019, 01:54.

    Comment

    Working...
    X