Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Residuals plot, mi

    Dear all,

    how can I plot the residuals against each predictor in my model when having a multiply- imputed dataset & weights (survey data)?

    Thank you in advance.

  • #2
    Hi Eva,

    Assuming it's just the MI causing problems rather than the weights, here are couple of possibilities:
    1. Most obviously, just do your plot by imputed dataset (indexed by _mi_m if you're using Stata's MI system). If you have used a large number of imputations, doing it for the first five or so should give you a feel. See for example figure 3 of 'White IR, Royston P, Wood AM. Multiple imputation using chained equations: issues and guidance for practice. Statistics in Medicine 2011; 30(4):377–399.' Assuming you're doing this for model checking and not a publication plot, this should be fine.
    2. If you don't want as many plots as the above creates, you could plot observed values using one symbol and, for each missing value, plot all the imputed values on the same plot. You may want to use a line to connect the imputes for one missing value.
    What not to do is average the imputed data for each missing value and then do one plot.

    Hope that helps, Tim

    Comment


    • #3
      Hi Tim,

      Thank you for your reply.

      I have tried
      Code:
      mi estimate, cmdok vceok esampvaryok: svy: probit depvar indepvar(list)
      Code:
      rvfplot ln_inc if _mi_m==1
      but it returns:
      last estimates not found
      r(301);
      should I somehow save the intermediate results? & if yes, how?

      Thank you in advance!

      Comment


      • #4
        Leaving aside any of the issues of results from multiple imputation, -rvfplot- only works after -regress-. So you will have to use -mi predict- to calculate both the predicted values and the "residuals" (I don't think that term is ordinarily used in connection with -probit-, but it's easy enough to just calculate observed minus predicted). Then use -graph twoway scatter- to plot them.

        Comment


        • #5
          Dear Clyde, thank you for the guidance.
          I tried:
          Code:
          mi estimate, noisily saving(miest,replace) cmdok vceok esampvaryok: probit (...)
          Code:
          mi predictnl resid using miest
          Code:
          graph twoway scatter resid ln_inc
          and after the mi predictnl command I get:
          'resid' found where resid = pnl_exp expected
          r(198);
          What is the syntax error?

          Comment


          • #6
            You're not following the syntax for -mi predict/predictnl-. -mi predictnl- doesn't know what "resid" means. It thinks resid is a variable name. And unlike ordinary -predict-, it does not have a -resid- option either. You need to do it this way:

            Code:
            mi predictnl predicted_probability = normal(xb()) using miest
            mi xeq 0: gen residual = outcome - predicted_probability // USE YOUR ACTUAL OUTCOME VARIABLE
            mi xeq 0: graph twoway scatter residual predicted_probability

            Comment


            • #7
              And can I use this syntax to plot the residuals against each model predictor by replacing 'outcome', or is it incorrect?

              Comment


              • #8
                To get residual vs. predictor plots all you have to change is the -graph twoway scatter- command:

                Code:
                mi xeq 0: graph twoway scatter residual predictor_variable
                (And there is no need to re-run the first two commands in #6 either.)

                Comment


                • #9
                  thank you very much!

                  Comment


                  • #10
                    Hi Clyde, Tim, Eva, and all observers,

                    I’m following up on this discussion, which was about post-estimation calculation of residuals for data with missing values. I’m interested in this for the purpose of model-checking, e.g. creating plots of residuals vs. fitted values to assess whether residuals are normally distributed, conducting related statistical tests, etc. I have three separate but overlapping questions on this topic…

                    First, Tim Morris, above, following White, et al. 2010 (see below for references), pp. 390-391 and esp. Figure 3, suggests computing fitted values and residuals for each imputed dataset, and then, for each imputed dataset, plotting these residuals against the fitted values.

                    I believe that I am doing this correctly, but wanted to submit my code for checking by other members of the list:

                    Code:
                    **Setup
                    use "${file_location}/Data/${model_purpose}_${model_venue}_${model_name}_${data_type}.dta", clear
                    mi xtset fips_num year
                    mi convert flong
                    
                    **Generating fitted values and residuals
                    mi predict yhat using "${file_location}/Estimates/${model_purpose}_${model_venue}_${model_name}_${interaction_terms}_${data_type}.ster", storecompleted
                    mi xeq: generate residual = ${depvar} - yhat
                    
                    **Diagnostic tests
                    mi xeq 1: kdensity residual, normal
                    mi xeq 1: graph twoway scatter residual yhat
                    Second, previously in this thread, Clyde Schechter suggested a different approach to assessing normality of residuals for a regression with imputed data. Specifically, he suggests first calculating fitted values for the entire dataset (using all imputations), then calculating residuals just for the observed data, and then creating an rvf plot, etc., for the observed data. Adopting his code, I do the following:

                    Code:
                    **Setup
                    use     "${file_location}/Data/${model_purpose}_${model_venue}_${model_name}_${data_type}.dta", clear
                    mi xtset fips_num year
                    
                    **Generating fitted values and residuals
                    mi predict yhat using "${file_location}/Estimates/${model_purpose}_${model_venue}_${model_name}_${interaction_terms}_${data_type}.ster"
                    mi xeq 0: gen residual = &{depvar} - yhat
                    
                    **Diagnostic tests
                    mi xeq 0: kdensity residual, normal
                    mi xeq 0: iqr residual
                    mi xeq 0: graph twoway scatter residual yhat
                    This is certainly another approach, and it appears to me statistically valid (i.e. does not violate any of Rubin’s rules, etc.) but it is not the one that White, Miles 2016, etc., are suggesting. It does, however, give results that are very similar to the results for separate imputations, which is not surprising. Are there any theoretical reasons for preferring the White/Miles approach to the one that Schechter is proposing?

                    Finally, I would be interested to hear if anyone has ideas on whether there is a statistically sound way, under Rubin’s rules, to combine residuals from each imputation, much like mi predict combines fitted values. This explicitly violates the directions of White et al., but it’s not clear to me why, if residuals are completely determined by model coefficients and fitted values, then residuals could not also be combined.

                    Thank you,
                    Ethan

                    References

                    White, Royston, Wood, 2010, “Multiple imputation using chained equations: Issues and guidance for practice,” Statistics in Medicine

                    Miles, 2016, "Obtaining predictions from models fit to multiply imputed data," Sociological Methods & Research

                    Comment


                    • #11
                      I am familiar with neither the White, Royston, Wood, nor the Miles paper. My approach is simply motivated by the working of the -mi- commands. Although -mi estimate- uses multiple imputed data sets, its ultimate product is a single set of regression coefficients that result from combining the coefficients fitted to the several imputations using Rubin's rules. Emphasis on single set of regression coefficients. -mi predict- uses this single set of regression coefficients created by -mi estimate- and calculates a single set of predicted values. (It does store them in each of the imputed data sets, but they are identical across imputations.)

                      The usual purpose for plotting residuals vs fitted values is to assess the fit of the model and visually appraise whether the residuals are homoscedastic. It seems to me that we are still in the context of having a single model (set of regression coefficients) and a single set of residuals and a single set of predicted values calculated from them. So the direct way to achieve these goals is to plot that single set of residuals against the single set of predicted values.

                      I have not delved deeply enough into the theory of multiple imputation to really comment on the other methods even if I were familiar with those articles. (I use multiple imputation only occasionally, and typically under duress, as the MAR assumption is almost never plausible in the context of my work.)

                      Comment


                      • #12
                        Hello discussants,

                        I was following the discussion here on predicting residuals for Stata's mi command.
                        I was applying the suggestion by Clyde Schechter as -

                        mi xeq 0: gen residual = outcome - predicted_probability // USE YOUR ACTUAL OUTCOME VARIABLE

                        But in Stata's manual for MI it is instructed not to do so although I don't understand why!
                        However I found residuals are comparatively larger calculated this way (e.g. mean is around 0.07, is this ok?).
                        Is this the only way to get residuals for a multiply imputed dataset? Has anybody here uses any other way to get residuals?

                        Thank you all.

                        Comment

                        Working...
                        X