Problem regarding imputation of missing value

Neeraj Kumar

Join Date: Jul 2017

Posts: 98
#1

Problem regarding imputation of missing value

29 Apr 2019, 09:03

Hii
I am working on self reported cross section data set. Most of the variable are ordinal in nature. There is some missing values in those variable. Except my dependent variable (Which is ordinal) all independent variable having some missing value (each variable 10-15% missing value). I tried compute it through mi using ologit. I am keeping M =20 (as recommended by stata code book) and random number seed = 1234 (I am still having little confusing regarding the use of Random no .seed). after imputing missing values for one variable, my total observation is increasing. like before imputing it is 1953 and after imputing it become 6133. if I am doing the same procedure for other variable it is increasing further. one more important thing during mi procedure my independent variable is only one variable corresponding to my imputed variable because if i am including other variable as independent variable it is showing error because other independent variable having some missing value. Why my observation is increasing this much and how to solve this issue?
Tags: None

daniel klein

Join Date: Mar 2014
Posts: 3886

29 Apr 2019, 09:12

Please show the exact code that you have used.

The code should look something like

Code:

set seed 42 // <- any number will do; used for reproducible results
mi set flong
mi register imputed varlist
mi impute chained (ologit) varlist = depvar [ varlist ] , add(20) noisily

Best
Daniel

Comment

Neeraj Kumar

Join Date: Jul 2017

Posts: 98
#3

29 Apr 2019, 09:33

Tnx for your prompt reply Mr. Daniel Klein. I am using the following code:

Code:

mi set mlong mi register imputed varlist mi impute ologit V109 V22, add(20) rseed(1234)

Just now I used the command you mentioned #2. still there is a same issue my observation is increased from 1953 to 41063.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3886
#4

29 Apr 2019, 09:57

Concerning code, make sure you include your dependent/outcome/response variable as predictor variables in the imputation model. Actually, (at least) all variables that you use in the analysis later should also be in the imputation model.

The increase in observation is expected and documented in the manual entries on MI. Start reading

Code:

help mi styles

which explains what is stored in the extra observations with each mi-style.

Best
Daniel
Comment
Neeraj Kumar

Join Date: Jul 2017

Posts: 98
#5

29 Apr 2019, 11:00

Thanks for your response Mr. Daniel. As a mention in code #3, V22 is my dependent variable as predictor variable I am using in the imputation model. if I am using my other independent variable as predictor variable in the imputation model. it is showing an error because my other independent variable also having missing values.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3886
#6

29 Apr 2019, 11:17

Originally posted by Neeraj Kumar View Post

if I am using my other independent variable as predictor variable in the imputation model. it is showing an error because my other independent variable also having missing values.

That is why you suggested you should use

Code:

mi imputed chained ...

where you specify all predictors with no missing values on the right-hand side of the equals sign.

Say, V109 and V110 were the only two (categorical) predictors with missing values; you would then impute those missing values as

Code:

mi impute chained (ologit) V109 V110 = V22 , add(20)

Do not run a series of single imputation models; instead, use a chained-equations approach and impute missing values in all variables at once.

Best
Daniel
Comment
Neeraj Kumar

Join Date: Jul 2017

Posts: 98
#7

29 Apr 2019, 15:08

Thanks for your patience and prompt reply Mr. Daniel. After doing then thing you mentioned in #6. Still the issue is same my observation is increase from 1953 to 41063. I am not understanding whether it is right or wrong. One more thing I noticed that in data browser. I am still seeing some missing value (.).
Comment
daniel klein

Join Date: Mar 2014

Posts: 3886
#8

29 Apr 2019, 23:29

Please follow my advice in #4 and read

Code:

help mi styles

where you will find the explanation for the extra observations in your dataset after imputing missing values along with an illustrating example. I do not know what to add to this.

Best
Daniel

Last edited by daniel klein; 29 Apr 2019, 23:34.
Comment
Neeraj Kumar

Join Date: Jul 2017

Posts: 98
#9

01 May 2019, 03:50

Thanks for your advice Mr. Daniel. Now i Know why my observation was increasing. It is because, I am using mlong and flong style. But if my using Wide and flongsep style my observation is remaining same. But now i am having doubt what style I should select. Because if I am using flongsep style and using add (20) it is generating 20 other file. and if I am using wide style and using add (20). it is generating 20 variable for that one particular variable. so what style i should choose? Can I reduce to add (20) to add (2)? Once again thanks for your helpful advice
Comment
daniel klein

Join Date: Mar 2014

Posts: 3886
#10

01 May 2019, 04:06

Originally posted by Neeraj Kumar View Post

so what style i should choose?

Choose whichever style is convenient. You will have to use mi estimate to run the analyses and the latter does not care which style you use.

Originally posted by Neeraj Kumar View Post

Can I reduce to add (20) to add (2)?

You can, but why would you want to do that?

Best
Daniel
Comment
Neeraj Kumar

Join Date: Jul 2017

Posts: 98
#11

01 May 2019, 05:12

Thanks Mr. Daniel for your reply. I don't have any issue with add (20). I just wanted to know whether i can do that or not. or if i reduce it to add (2) what will be the impact? I want to know one more thing as I mention in #1 my all variable are categorical except one. I am having few variable which are categorical but they are not in order form. so I have to calculate the missing values for them using mlogit command. Earlier I only used order logit because they are having order in categories. I am sharing my codes just let me know whether it is correct or not because when i used that code it showing error.

Code:

set seed 42 mi set wide mi register imputed V108 V109 V110 V111 V112 V113 V126 M127 M128 M129 M130 M132 V133 V134 V135 V136 V137 V138 V139 V140 V141 V142 V143 reli Hth EM SC ED incomescale Sex Marital V237 mi register regular V22 mi impute chained (ologit)V108 V109 V110 V111 V112 V113 V126 M127 M128 M129 M130 M132 V133 V134 V135 V136 V137 V138 V139 V140 V141 V142 V143 (mlogit)reli Hth EM SC ED incomescale (logit) Sex Marital (regress) V237 = V22, add (20)

Thanks you so much for your help
Comment
daniel klein

Join Date: Mar 2014

Posts: 3886
#12

01 May 2019, 05:59

Originally posted by Neeraj Kumar View Post

if i reduce it to add (2) what will be the impact?

You will only have 2 complete datasets. Part of the theory behind MI is based on asymptotics in M, i.e., the number of imputations. There are a couple of rules of thumb on how many imputations you need to get valid results; 2 is most certainly not enough.

Originally posted by Neeraj Kumar View Post

I am sharing my codes just let me know whether it is correct or not because when i used that code it showing error.

Which error? What exactly does Stata do/respond when you issue that syntax?

Best
Daniel
Comment
Neeraj Kumar

Join Date: Jul 2017

Posts: 98
#13

01 May 2019, 08:10

Thanks Mr. Daniel. it showing error 2000. Is it ok? instead of calculating all together if i calculate first with all ologit variable. then using these variable in next time i calculate mlogit variable, then using ologit and mlogit, calculate logit variable and in the last using all calculate continuous variable.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3886
#14

01 May 2019, 08:37

No. As I mentioned earlier, you cannot run models separately. The reason is that you cannot include predictors with missing values; however, you must include those predictors to account for the correlations with the variables that you are imputing. And, specifying separate models will not help with the error message. The error means that there is some model where there is not a single observation with all non-missing values. You need to find out where that error comes from. Add to your mi imputed command the flowing two options

Code:

mi impute (chained) ... , ... noisily showcommand

noisily will show all models that are estimated; showcommand is not documented but will give you an idea which model Stata is trying to run.

Best
Daniel
1 like
Comment

Neeraj Kumar

Join Date: Jul 2017
Posts: 98

#15

01 May 2019, 09:23

I used the following code. but still it show error 498. perfect predictor detected.

Code:

set seed 42
mi set wide
mi register imputed V105 V106 V107 V108 V109 V110 V111 V112 V113 V126 V127 V128 V129 V130 V132 V133 V134 V135 V136 V137 V138 V139 V140 V141 V142 V143 V237 religion health em sclass ed incomescale marriage gd
mi register regular V22
mi impute chained (ologit) V105 V106 V107 V108 V109 V110 V111 V112 V113 V126 V127 V128 V129 V130 V132 V133 V134 V135 V136 V137 V138 V139 V140 V141 V142 V143  health sclass incomescale (mlogit) religion em  ed (logit) marriage gd (regress) V237 = V22, add(20) noisily showcommand

Thank you so much for your prompt reply.

Announcement

Problem regarding imputation of missing value

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment