Missing Data in my imputed results

roxy kallsen

Join Date: Oct 2018

Posts: 37
#1

Missing Data in my imputed results

11 Apr 2019, 14:24

Hello,

I am having some problems with the final results of my imputation. I administered the Beck Anxiety Inventory a 21-item screening test designed to measure the occurrence and severity of anxiety symptoms (BAI_SCORE) at baseline, month 3 and month 6. Total raw scores were obtained by summing the 21 item scores with possible scores ranging from 0 to 63. Data was missing on the BAI outcome as well as other IVs (INTSCORE_M6 = an intensity family contact scale)and covariates (PWAS_M_2 = mean score for parental warmth; FAM_M3 = whether or not family therapy took place; OYAS = recidivism risk). I used variables with full data as well as other auxiliary ones to impute the variables with missing data.

HTML Code:

set seed 832016 mi set wide mi register imputed INF_M6a INTSCORE_M6 BAI_SCORE_1 BAI_SCORE_2 PWAS_M_1 PWAS_M_2 FAM_M3 FAM_M6 OYAS_FINAL_C mi register regular BAI_SCORE_0 BDI_SCORE_0 INTSCORE_M3 PWAS_M_0 VIC_M3a AGE_AD MILES LENGTH2 DUMMY_RACE EXITR mi impute chained (nbreg) INF_M6a (poisson) BAI_SCORE_1 BAI_SCORE_2 INTSCORE_M6 (regress) PWAS_M_1 PWAS_M_2 (logit) FAM_M3 FAM_M6 (ologit) OYAS_FINAL_C = BAI_SCORE_0 BDI_SCORE_0 INTSCORE_M3 PWAS_M_0 VIC_M3a AGE_AD MILES LENGTH2

When I checked the imputed table I noticed that the number of imputed observations did not match the number of incomplete observations but the total row shows the number of the complete data. Also, I noticed that the BAI in the imputed data had values that exceeded the possible range (0-64).

Any recommendations or suggestions are highly appreciated.

Best,

Roxy
Attached Files
Tags: None
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#2

11 Apr 2019, 17:42

Focusing only on the issue of the imputed scores being out of range: you used Poisson regression to impute the BAI scores. The support of a Poisson distribution is zero to infinity. So, naturally, some people will have scores above the maximum possible range. This need not be a killer for the analysis if there aren't many of them, and there probably aren't.

Now, the BAI score is the sum of 7 questions. You could have imputed at the question level using ordered logistic regression, then use mi passive to generate a sum score, e.g.

Code:

mi impute chained (nbreg) INF_M6a (ologit) BAI_q1 BAI_q2 BAI_q3 ... INTSCORE_M6 (regress) PWAS_M_1 PWAS_M_2 (logit) FAM_M3 FAM_M6 (ologit) OYAS_FINAL_C = BAI_SCORE_0 BDI_SCORE_0 INTSCORE_M3 PWAS_M_0 VIC_M3a AGE_AD MILES LENGTH2, augment mi passive: egen BAI = rowtotal(BAI_q1-BAI_q7)

I have always included the augment option when doing ordered logit imputation.

I am very puzzled by the issue of missing imputed values. I thought that Stata would return an error message if it had imputed missing values; this can happen when some predictors contain missing values but they weren't imputed themselves.

Last edited by Weiwen Ng; 11 Apr 2019, 17:45.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
roxy kallsen

Join Date: Oct 2018

Posts: 37
#3

11 Apr 2019, 20:47

Hi Weiwen,

That's a great suggestion. I will try running it again at the question level. But just to be sure the BAI survey that I am using has 21 items, so I will simply add BAI_q1-BAI_q21.

Thank you for your help,

Roxy
Comment

roxy kallsen

Join Date: Oct 2018
Posts: 37

11 Apr 2019, 21:47

Hi Weiwen,

I reran the mi command with your suggestion but received an error command. I am not sure why it did not work. Is there another way that I can use the total score but restrict the values from 0 to 64?

HTML Code:

 set seed 832016
mi set wide 
mi register imputed INTSCORE_M6 BAI_1_1 BAI_2_1 BAI_3_1 BAI_4_1 BAI_5_1 BAI_6_1 BAI_7_1 BAI_8_1 BAI_9_1 BAI_10_1 BAI_11_1 BAI_12_1 BAI_13_1 BAI_14_1 BAI_15_1 BAI_16_1 BAI_17_1 BAI_18_1 BAI_19_1 BAI_20_1 BAI_21_1 BAI_1_2 BAI_2_2 BAI_3_2 BAI_4_2 BAI_5_2 BAI_6_2 BAI_7_2 BAI_8_2 BAI_9_2 BAI_10_2 BAI_11_2 BAI_12_2 BAI_13_2 BAI_14_2 BAI_15_2 BAI_16_2 BAI_17_2 BAI_18_2 BAI_19_2 BAI_20_2 BAI_21_2 PWAS_M_1 PWAS_M_2 FAM_M3 FAM_M6 OYAS_FINAL_C 
mi impute chained (poisson) INTSCORE_M6 (regress) PWAS_M_1 PWAS_M_2 (logit) FAM_M3 FAM_M6 (ologit) BAI_1_1 BAI_2_1 BAI_3_1 BAI_4_1 BAI_5_1 BAI_6_1 BAI_7_1 BAI_8_1 BAI_9_1 BAI_10_1 BAI_11_1 BAI_12_1 BAI_13_1 BAI_14_1 BAI_15_1 BAI_16_1 BAI_17_1 BAI_18_1 BAI_19_1 BAI_20_1 BAI_21_1 BAI_1_2 BAI_2_2 BAI_3_2 BAI_4_2 BAI_5_2 BAI_6_2 BAI_7_2 BAI_8_2 BAI_9_2 BAI_10_2 BAI_11_2 BAI_12_2 BAI_13_2 BAI_14_2 BAI_15_2 BAI_16_2 BAI_17_2 BAI_18_2 BAI_19_2 BAI_20_2 BAI_21_2 OYAS_FINAL_C = BAI_SCORE_0 BDI_SCORE_0 INTSCORE_M3 PWAS_M_0 VIC_M3a AGE_AD MILES LENGTH2 DUMMY_RACE, add(25) force dots augment
mi passive: egen BAI_score1 = rowtotal(BAI_1_1-BAI_21_1)
mi passive: egen BAI_score2 = rowtotal(BAI_1_2-BAI_21_2)

Thank you again for your help,

Attached Files

Comment

daniel klein

Join Date: Mar 2014

Posts: 3886
#5

11 Apr 2019, 23:06

Apparently, the code shown in #1 is not posted as typed. As Weiwen correctly points out: Stata will issue an error when missing values are imputed. I, therefore, assume that Roxi has added the force option, as in the code in #4. I have repeatedly seen people specifying force by default: please do not! The option means what it says: you force Stata to do something it would not otherwise do; and, there are probably good reasons not to do it. Instead of forcing Stata to do something, fix the underlying problem.

Drop the force option and instead add noisily and (the non-documented) showcommand option to see where the problems (imputed missing values as well as non-convergence) appear and work from there trying to fix these problems. Also, you should include your regular variables without missing values in the imputation model.

Best
Daniel
2 likes
Comment

roxy kallsen

Join Date: Oct 2018
Posts: 37

12 Apr 2019, 06:16

Hi Daniel,

Thank you for the advice. Just to be clear. Is this how you suggest I edit the syntax?

HTML Code:

  set seed 832016 mi set wide  mi register imputed INF_M6a INTSCORE_M6 BAI_SCORE_1 BAI_SCORE_2 PWAS_M_1 PWAS_M_2 FAM_M3 FAM_M6 OYAS_FINAL_C  mi register regular BAI_SCORE_0 BDI_SCORE_0 INTSCORE_M3 PWAS_M_0 VIC_M3a AGE_AD MILES LENGTH2 DUMMY_RACE EXITR mi impute chained (nbreg) INF_M6a (poisson) BAI_SCORE_1 BAI_SCORE_2 INTSCORE_M6 (regress) PWAS_M_1 PWAS_M_2 (logit) FAM_M3 FAM_M6 (ologit) OYAS_FINAL_C = BAI_SCORE_0 BDI_SCORE_0 INTSCORE_M3 PWAS_M_0 VIC_M3a AGE_AD MILES LENGTH2, add(25) augument noisily showcommand

Also, all the variables on the right of the equal sign are the ones that are being used to impute the variables on the left side. The right-side variables do not have missing data.

Thank you,

Roxy

Comment

daniel klein

Join Date: Mar 2014

Posts: 3886
#7

12 Apr 2019, 06:59

I am sorry, I have missed the equals sign in your syntax. I would double check to see whether there are really no missing values in these predictors. In my experience, this is usually the reason for missing imputed values.

The options look good now. As Weisen has suggested, imputing the single items then create the scores is the recommended method. Depending on the number of levels in the items and their distribution, I would not feel too bad abou switching to pmm if ordered logisitc models failed.

Best
Daniel
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#8

12 Apr 2019, 09:18

Roxy, the FAQ does ask us not to post screenshots, as they're not always legible. You can post results, tables, and error messages in code delimiters, and they'll format correctly for most devices.

I missed the fact that you said that the Beck Anxiety Index was 21 binary questions. I wonder if you had tried switching to logit when imputing, e.g.

Code:

mi impute chained (poisson) INTSCORE_M6 (regress) PWAS_M_1 PWAS_M_2 (logit) FAM_M3 FAM_M6 (logit) BAI_1_1 BAI_2_1 BAI_3_1 BAI_4_1 BAI_5_1 BAI_6_1 BAI_7_1 BAI_8_1 BAI_9_1 BAI_10_1 BAI_11_1 BAI_12_1 BAI_13_1 BAI_14_1 BAI_15_1 BAI_16_1 BAI_17_1 BAI_18_1 BAI_19_1 BAI_20_1 BAI_21_1 BAI_1_2 BAI_2_2 BAI_3_2 BAI_4_2 BAI_5_2 BAI_6_2 BAI_7_2 BAI_8_2 BAI_9_2 BAI_10_2 BAI_11_2 BAI_12_2 BAI_13_2 BAI_14_2 BAI_15_2 BAI_16_2 BAI_17_2 BAI_18_2 BAI_19_2 BAI_20_2 BAI_21_2 OYAS_FINAL_C = BAI_SCORE_0 BDI_SCORE_0 INTSCORE_M3 PWAS_M_0 VIC_M3a AGE_AD MILES LENGTH2 DUMMY_RACE, add(25) force dots augment

I failed to explain myself earlier, but if you are missing BAI scores because people fail to answer single items rather than the whole questionnaire, you should definitely impute at the question level and calculate a sum score. If you simply impute the full score, then you are dropping the information that your participants provided.

If the issue is that people are missing the entire instrument, then I think you could be OK imputing the full score.

I had forgot about the predictive mean matching option. This is a sort of blend of propensity scoring and imputation. It will respect the characteristics of the score distribution, i.e. it will impute integers within a range, because it's imputing by random draws from similar observations' scores. Some info here. Note the recommendation at the link to change the default number of nearest neighbors to at least 5 (i.e. impute by random draw from each observation's 5 nearest neighbors; default is 1 nearest neighbor).

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
roxy kallsen

Join Date: Oct 2018

Posts: 37
#9

12 Apr 2019, 12:09

Hi Weiwen,

I chose to impute the sum score rather than at the question level given that I am missing the entire instrument at months 3 and 6. I added pmm and knn(5) but for some reason I keep getting the following error.

HTML Code:

mi impute chained (pmm) BAI_SCORE_1 BAI_SCORE_2 (poisson) INTSCORE_M6 (regress) > PWAS_M_1 PWAS_M_2 (logit) FAM_M3 FAM_M6 (ologit) OYAS_FINAL_C = BAI_SCORE_0 BD > I_SCORE_0 INTSCORE_M3 PWAS_M_0 VIC_M3a AGE_AD MILES LENGTH2 DUMMY_RACE, add(25) > augment replace knn(5) option knn() not allowed
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#10

12 Apr 2019, 12:30

Originally posted by roxy kallsen View Post

...

HTML Code:

mi impute chained (pmm, knn(5)) BAI_SCORE_1 BAI_SCORE_2 (poisson) INTSCORE_M6 (regress) PWAS_M_1 PWAS_M_2 (logit) FAM_M3 FAM_M6 (ologit) OYAS_FINAL_C = BAI_SCORE_0 BD I_SCORE_0 INTSCORE_M3 PWAS_M_0 VIC_M3a AGE_AD MILES LENGTH2 DUMMY_RACE, add(25) replace

The knn option goes in a different place in the syntax, as corrected above. The augment option may no longer be necessary if you aren't imputing a bunch of ordered or regular logistic items, but it probably won't hurt to include it.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
roxy kallsen

Join Date: Oct 2018

Posts: 37
#11

13 Apr 2019, 12:00

Weiwen,

Thank you so much for correcting my syntax. It worked.
Comment

Announcement