Imputation

Soumya Upadhyay

Join Date: Oct 2016

Posts: 43
#1

Imputation

24 Aug 2018, 16:46

Is it okay to impute missing data for independent variables? I have read that usually missing covariates are imputed.
Also, please guide me on how to impute a categorical IV (3 categories- 0 (low), 1 (medium), 2 (robust)) in Stata?
Tags: None
Muhammad Rashid

Join Date: Aug 2018

Posts: 38
#2

24 Aug 2018, 16:55

There is good evidence now that all variable should be imputed including the outcomes. for the ordinal variable you can use ologit model
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30121
#3

24 Aug 2018, 16:58

From a mathematical and statistical perspective there is no difference between an independent variable and a covariate. The difference is only in whether we care directly about their effects or whether we include them in our models only to reduce their distortion of the effects of the variables we do care about. So yes, it is as OK to impute an independent variable as it is to impute any covariate.

With a three-level covariate, if you are using multiple imputation with chained equations, you would probably use a multinomial logistic model to impute this one.

Added: Crossed with #2. If the three categories, low, medium, and robust, can be considered as ordered, then, yes, it would be better to use an ordinal logistic model to impute it--these converge much more easily than multinomial logistics. But if they aren't really ordered (I don't know how "robust" relates to "low" and "medium" without more context) than you can't use ordinal logistic modeling.

Last edited by Clyde Schechter; 24 Aug 2018, 17:01.
Comment
Soumya Upadhyay

Join Date: Oct 2016

Posts: 43
#4

27 Aug 2018, 16:55

I did an ordinal logistic model to impute it.
mi impute ologit ccscorecatnew1 pos_mgmt bedcode TEACH, add(20) rseed(1234)

where ccscorecatnew1 is the cultural competency score that needs to be imputed, pos_mgmt is positive score for management support which is otherwise the outcome variable in the main model, bedcode is hospital size, TEACH is teaching status. ccscorecatnew1 is ordered in 3 categories: 1 robust, 2 medium, and 3 low. There were 77 out of 283 missing values for ccscorecatnew1.

The output shows that 77 values were incomplete and were imputed. However, when I do a tab, I see the total as 206 instead of 283.
Why are results from tab not showing the total of 283? I went to the data editor as well and there are missing values still present despite the output window showing that they have been imputed.
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#5

27 Aug 2018, 17:20

Maybe you could present what you typed afterwards. You are supposed to get the imputed values for the regression analysis, provided the commands were typed accordingly.

Best regards,

Marcos
Comment
Soumya Upadhyay

Join Date: Oct 2016

Posts: 43
#6

27 Aug 2018, 22:48

Ok, so it wouldn't show in the actual dataset under 'data editor'?
Comment
Soumya Upadhyay

Join Date: Oct 2016

Posts: 43
#7

28 Aug 2018, 12:56

In my dataset the main IV, cultural competency score has missing data. Other control variables such as organization's location, organization's ownership status also have missing data. The DV doesn't have any missing data. When I perform multiple imputation, I do it only on my main IV-cultural competency score. Since it has rank ordered categories (1,2,3), I performed an ordinal logstics regression and used my main model's DV as an IV, and some other control variables that are non-missing. I was able to impute my main IV, cultural competency score successfully. Then, I ran my main regression model. Since some other control variables are missing, the sample size reduced again.

Given the above issue, how can I impute all of the variables that are missing ? I am using the multiple imputation control panel under statistics and I followed Stats's multiple imputation video on Youtube. Thank you for your help.
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#8

28 Aug 2018, 15:12

I was able to impute my main IV, cultural competency score successfully. Then, I ran my main regression model. Since some other control variables are missing, the sample size reduced again.

I believe you should perform the multiple imputation for all missing variables (needed to the model) at once.

Best regards,

Marcos
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#9

28 Aug 2018, 15:54

Originally posted by Soumya Upadhyay View Post

In my dataset the main IV, cultural competency score has missing data. Other control variables such as organization's location, organization's ownership status also have missing data. The DV doesn't have any missing data. When I perform multiple imputation, I do it only on my main IV-cultural competency score. Since it has rank ordered categories (1,2,3), I performed an ordinal logstics regression and used my main model's DV as an IV, and some other control variables that are non-missing. I was able to impute my main IV, cultural competency score successfully. Then, I ran my main regression model. Since some other control variables are missing, the sample size reduced again.

Given the above issue, how can I impute all of the variables that are missing ? I am using the multiple imputation control panel under statistics and I followed Stats's multiple imputation video on Youtube. Thank you for your help.

What Marcos gave you is correct. You should also note that you can actually use multiple imputation models simultaneously. For example, the code below treats beds as continuous and teaching status as binary, and it adds the independent variables x1-x3 that I assume are complete:

Code:

mi impute (ologit) ccscorecatnew1 pos_mgmt (regress) beds (logit) TEACH = x1 x2 x3, add(20) rseed(1234)

Best practice is that you want the imputation model to have all the variables in your final model.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment