mi imputed chained equation got perfect prediction error even with augment option

Neo Zhou

Join Date: Jul 2019

Posts: 36
#1

mi imputed chained equation got perfect prediction error even with augment option

29 Jul 2019, 13:21

I was trying to fit a hybrid model using Richard Williams's method. Hybrid model can not include factor variables; as suggested, I created the dummy variables before applying imputation using:

Code:

tab oldvar, gen(newvar)

then, I registered my imputed variables and began the imputation. I added the "augment" option for logit regression to deal with perfect prediction like this:

Code:

mi impute chained (logit,augment) varlist1 /// (pmm,knn(5)) varlist2 = varlist3, /// add(2) rseed(19941122) dots noisily force

where varlist1 are dummy variables with missing values, varlist2 are continuous variables with missing values, varlist3 are non-missing variables
I thought the "augment" option will suppress the perfect prediction error message, but I still got error messages saying that I have perfect prediction issues.
Generate dummy variables before imputation will make the impute model complicated and will easily cause perfect prediction issues. Although it is recommended to prepare your variables properly before imputation, sometimes it may be easier to do it after imputation. I was wondering how could I do so properly if I want to dichotomize my 0-1 variables after imputation. Thanks.
Tags: None
daniel klein

Join Date: Mar 2014

Posts: 3859
#2

29 Jul 2019, 13:51

The details are not very clear, but there might be a couple of issues that you would have to address here.

First, using the hybrid method, i.e., including mean values to capture between effects, implies that these mean values must be part of the imputation model; otherwise. the correlations between the mean variables and the outcome will be biassed towards zero. Unfortunately, the mean cannot be computed accurately before the imputation if there are missing (at random) observations. Imputation for multilevel models is, in general, quite a challenge.

Second, creating dummy (indicator) variables before the imputation is fine though completely unnecessary if you have binary variables; variables with more than two categories, hence represented by multiple indicator variables, might be more problematic, especially when you use logit to impute each of them separately. The imputed values will most likely not add up to 1. Conversely, the observed values are collinear, which might result in perfect predictions.

Anyway, if you have only binary variables, you do not need to transform them at all: make sure that the values are coded 0 and 1 and impute using logit. Then plug the imputed variables into your regression model just like you do with continuous variables; factor variable notation is not relevant for binary variables during estimation. Post-estimation, especially margins, would require factor variable notation but post-estimation after a hybrid model with imputed data probably requires a lot more than factor variable notation.

Best
Daniel
Comment
Neo Zhou

Join Date: Jul 2019

Posts: 36
#3

29 Jul 2019, 14:20

Originally posted by daniel klein View Post

The details are not very clear, but there might be a couple of issues that you would have to address here.

First, using the hybrid method, i.e., including mean values to capture between effects, implies that these mean values must be part of the imputation model; otherwise. the correlations between the mean variables and the outcome will be biassed towards zero. Unfortunately, the mean cannot be computed accurately before the imputation if there are missing (at random) observations. Imputation for multilevel models is, in general, quite a challenge.

Second, creating dummy (indicator) variables before the imputation is fine though completely unnecessary if you have binary variables; variables with more than two categories, hence represented by multiple indicator variables, might be more problematic, especially when you use logit to impute each of them separately. The imputed values will most likely not add up to 1. Conversely, the observed values are collinear, which might result in perfect predictions.

Anyway, if you have only binary variables, you do not need to transform them at all: make sure that the values are coded 0 and 1 and impute using logit. Then plug the imputed variables into your regression model just like you do with continuous variables; factor variable notation is not relevant for binary variables during estimation. Post-estimation, especially margins, would require factor variable notation but post-estimation after a hybrid model with imputed data probably requires a lot more than factor variable notation.

Best
Daniel

Right. You reminds me that for 0-1 variables, it is not necessary to specify them using i.varname. The group takes 0 will be considered as the comparing group automatically. Thanks.
Comment
Marisol Kevelson

Join Date: Aug 2019

Posts: 6
#4

20 Aug 2019, 10:23

Hello Neo,

I am wondering if removing the dummy variables and instead including the original variables led the imputation to work without the perfect prediction error. I am encountering the same issue. I have tried using the "augment" command for the logit variables, which leads the imputation to run but results in some missing values still not being imputed.

Thanks,
Mari
Comment
Neo Zhou

Join Date: Jul 2019

Posts: 36
#5

04 Dec 2019, 18:29

Originally posted by Marisol Kevelson View Post

Hello Neo,

I am wondering if removing the dummy variables and instead including the original variables led the imputation to work without the perfect prediction error. I am encountering the same issue. I have tried using the "augment" command for the logit variables, which leads the imputation to run but results in some missing values still not being imputed.

Thanks,
Mari

Hi, I moved forward and got it done but forgot the details about how did I make it. But I think you are right, the perfect prediction issue occurs when sometimes including more dummy variables and the dummy variables can perfectly predict the outcome.
Comment

Announcement

mi imputed chained equation got perfect prediction error even with augment option

Comment

Comment

Comment

Comment