Residuals plot, mi

Eva Windisch

Join Date: Apr 2016

Posts: 9
#1

Residuals plot, mi

25 Apr 2016, 07:47

Dear all,

how can I plot the residuals against each predictor in my model when having a multiply- imputed dataset & weights (survey data)?

Thank you in advance.
Tags: None
Tim Morris

Join Date: Apr 2014

Posts: 92
#2

25 Apr 2016, 08:21

Hi Eva,

Assuming it's just the MI causing problems rather than the weights, here are couple of possibilities:
Most obviously, just do your plot by imputed dataset (indexed by _mi_m if you're using Stata's MI system). If you have used a large number of imputations, doing it for the first five or so should give you a feel. See for example figure 3 of 'White IR, Royston P, Wood AM. Multiple imputation using chained equations: issues and guidance for practice. Statistics in Medicine 2011; 30(4):377–399.' Assuming you're doing this for model checking and not a publication plot, this should be fine.

If you don't want as many plots as the above creates, you could plot observed values using one symbol and, for each missing value, plot all the imputed values on the same plot. You may want to use a line to connect the imputes for one missing value.

What not to do is average the imputed data for each missing value and then do one plot.

Hope that helps, Tim
Comment
Eva Windisch

Join Date: Apr 2016

Posts: 9
#3

25 Apr 2016, 08:58

Hi Tim,

Thank you for your reply.

I have tried

Code:

mi estimate, cmdok vceok esampvaryok: svy: probit depvar indepvar(list)

Code:

rvfplot ln_inc if _mi_m==1

but it returns:

last estimates not found
r(301);

should I somehow save the intermediate results? & if yes, how?

Thank you in advance!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#4

25 Apr 2016, 10:38

Leaving aside any of the issues of results from multiple imputation, -rvfplot- only works after -regress-. So you will have to use -mi predict- to calculate both the predicted values and the "residuals" (I don't think that term is ordinarily used in connection with -probit-, but it's easy enough to just calculate observed minus predicted). Then use -graph twoway scatter- to plot them.
Comment
Eva Windisch

Join Date: Apr 2016

Posts: 9
#5

25 Apr 2016, 11:40

Dear Clyde, thank you for the guidance.
I tried:

Code:

mi estimate, noisily saving(miest,replace) cmdok vceok esampvaryok: probit (...)

Code:

mi predictnl resid using miest

Code:

graph twoway scatter resid ln_inc

and after the mi predictnl command I get:

'resid' found where resid = pnl_exp expected
r(198);

What is the syntax error?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#6

25 Apr 2016, 12:02

You're not following the syntax for -mi predict/predictnl-. -mi predictnl- doesn't know what "resid" means. It thinks resid is a variable name. And unlike ordinary -predict-, it does not have a -resid- option either. You need to do it this way:

Code:

mi predictnl predicted_probability = normal(xb()) using miest mi xeq 0: gen residual = outcome - predicted_probability // USE YOUR ACTUAL OUTCOME VARIABLE mi xeq 0: graph twoway scatter residual predicted_probability
Comment
Eva Windisch

Join Date: Apr 2016

Posts: 9
#7

25 Apr 2016, 12:44

And can I use this syntax to plot the residuals against each model predictor by replacing 'outcome', or is it incorrect?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#8

25 Apr 2016, 12:49

To get residual vs. predictor plots all you have to change is the -graph twoway scatter- command:

Code:

mi xeq 0: graph twoway scatter residual predictor_variable

(And there is no need to re-run the first two commands in #6 either.)
Comment
Eva Windisch

Join Date: Apr 2016

Posts: 9
#9

25 Apr 2016, 14:22

thank you very much!
Comment
Ethan Schoolman

Join Date: May 2016

Posts: 31
#10

15 Feb 2017, 04:43

Hi Clyde, Tim, Eva, and all observers,

I’m following up on this discussion, which was about post-estimation calculation of residuals for data with missing values. I’m interested in this for the purpose of model-checking, e.g. creating plots of residuals vs. fitted values to assess whether residuals are normally distributed, conducting related statistical tests, etc. I have three separate but overlapping questions on this topic…

First, Tim Morris, above, following White, et al. 2010 (see below for references), pp. 390-391 and esp. Figure 3, suggests computing fitted values and residuals for each imputed dataset, and then, for each imputed dataset, plotting these residuals against the fitted values.

I believe that I am doing this correctly, but wanted to submit my code for checking by other members of the list:

Code:

**Setup use "${file_location}/Data/${model_purpose}_${model_venue}_${model_name}_${data_type}.dta", clear mi xtset fips_num year mi convert flong **Generating fitted values and residuals mi predict yhat using "${file_location}/Estimates/${model_purpose}_${model_venue}_${model_name}_${interaction_terms}_${data_type}.ster", storecompleted mi xeq: generate residual = ${depvar} - yhat **Diagnostic tests mi xeq 1: kdensity residual, normal mi xeq 1: graph twoway scatter residual yhat

Second, previously in this thread, Clyde Schechter suggested a different approach to assessing normality of residuals for a regression with imputed data. Specifically, he suggests first calculating fitted values for the entire dataset (using all imputations), then calculating residuals just for the observed data, and then creating an rvf plot, etc., for the observed data. Adopting his code, I do the following:

Code:

**Setup use "${file_location}/Data/${model_purpose}_${model_venue}_${model_name}_${data_type}.dta", clear mi xtset fips_num year **Generating fitted values and residuals mi predict yhat using "${file_location}/Estimates/${model_purpose}_${model_venue}_${model_name}_${interaction_terms}_${data_type}.ster" mi xeq 0: gen residual = &{depvar} - yhat **Diagnostic tests mi xeq 0: kdensity residual, normal mi xeq 0: iqr residual mi xeq 0: graph twoway scatter residual yhat

This is certainly another approach, and it appears to me statistically valid (i.e. does not violate any of Rubin’s rules, etc.) but it is not the one that White, Miles 2016, etc., are suggesting. It does, however, give results that are very similar to the results for separate imputations, which is not surprising. Are there any theoretical reasons for preferring the White/Miles approach to the one that Schechter is proposing?

Finally, I would be interested to hear if anyone has ideas on whether there is a statistically sound way, under Rubin’s rules, to combine residuals from each imputation, much like mi predict combines fitted values. This explicitly violates the directions of White et al., but it’s not clear to me why, if residuals are completely determined by model coefficients and fitted values, then residuals could not also be combined.

Thank you,
Ethan

References

White, Royston, Wood, 2010, “Multiple imputation using chained equations: Issues and guidance for practice,” Statistics in Medicine

Miles, 2016, "Obtaining predictions from models fit to multiply imputed data," Sociological Methods & Research
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#11

15 Feb 2017, 11:58

I am familiar with neither the White, Royston, Wood, nor the Miles paper. My approach is simply motivated by the working of the -mi- commands. Although -mi estimate- uses multiple imputed data sets, its ultimate product is a single set of regression coefficients that result from combining the coefficients fitted to the several imputations using Rubin's rules. Emphasis on single set of regression coefficients. -mi predict- uses this single set of regression coefficients created by -mi estimate- and calculates a single set of predicted values. (It does store them in each of the imputed data sets, but they are identical across imputations.)

The usual purpose for plotting residuals vs fitted values is to assess the fit of the model and visually appraise whether the residuals are homoscedastic. It seems to me that we are still in the context of having a single model (set of regression coefficients) and a single set of residuals and a single set of predicted values calculated from them. So the direct way to achieve these goals is to plot that single set of residuals against the single set of predicted values.

I have not delved deeply enough into the theory of multiple imputation to really comment on the other methods even if I were familiar with those articles. (I use multiple imputation only occasionally, and typically under duress, as the MAR assumption is almost never plausible in the context of my work.)
Comment
Tista Kundu

Join Date: Jan 2018

Posts: 10
#12

08 Mar 2018, 04:39

Hello discussants,

I was following the discussion here on predicting residuals for Stata's mi command.
I was applying the suggestion by Clyde Schechter as -

mi xeq 0: gen residual = outcome - predicted_probability // USE YOUR ACTUAL OUTCOME VARIABLE

But in Stata's manual for MI it is instructed not to do so although I don't understand why!
However I found residuals are comparatively larger calculated this way (e.g. mean is around 0.07, is this ok?).
Is this the only way to get residuals for a multiply imputed dataset? Has anybody here uses any other way to get residuals?

Thank you all.
Comment

Announcement

Residuals plot, mi

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment