Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Checking homoscedasticity and linearity assumptions when using mi estimate: and svy:

    I am currently working on a client's project which contains pweights and is using multiple imputations. I am having trouble checking the homoscedasticity assumption and normality of residuals of the multiple regression after running the model using both mi estimate: and svy: since the code I would normally use to check these assumptions (predict and rvfplot) is returning an error saying that there are no estimates available. I was going to check the assumptions for each one of the imputed datasets, and do it that way, but is there a way for me to obtain the standardized estimates and standardized residuals to check the homoscedasticity assumption when using both mi estimate: and svy: to create my model? I haven't been able to find anything so far, but I may be missing the needle in the haystack.

    Thank you ahead of time, I greatly appreciate any assistance provided.

  • #2
    The assumptions of constant standard residual standard deviations and normality of residuals are not required for the validity of survey standard errors and tests. See the manual entry on "Variance estimation for survey data". Therefore the checks you mention are not needed.
    On the other hand, tests of model specification are important: presence of interactions, nonlinearity, prediction accuracy. One omnibus postestimation test is linktest. (or mi estimate: linktest). See also Richard Valliant's 2010 presentation Linear Regression Diagnostics for Survey Data. I would examine several of the imputation data sets separately.

    One serious problem is robustness to outliers and high leverage observations. To be safe, I suggest that you ignore the design and the run very robust mmregress package, by Verardi and Croux (net describe st0173_1,(http://www.stata-journal.com/software/sj10-2)). See the accompanying Stata Journal article, which is available for download.

    Also try median regression with qreg2 by Joao Santos Silva (SSC). Although qreg2 is not "svy" aware, it can incorporate the main design features, clustering and probability weighting. If you have covariates that describe the strata, you add those variables to your model. Otherwise, ignoring the strata will only inflate standard errors, slightly.

    Reference:
    Verardi, Vincenzo, and Christophe Croux. 2009. Robust regression in Stata. Stata Journal 9, no. 3: 439-453.
    Last edited by Steve Samuels; 24 May 2015, 18:10.
    Steve Samuels
    Statistical Consulting
    [email protected]

    Stata 14.2

    Comment


    • #3
      I will definitely look into this further. Thank you for giving me a point in the right direction, I truly appreciate it.

      Comment


      • #4
        Dear Steve,

        I am also working with a svy regression and want to find outliers with a high leverage. I read some posts where you proposed to use the mmregress command. The graph can not be shown, because of: "Robust_distance not found". My code looks kind of like that.

        . xi: mmregress pmsindex8_std i.ownership i.grpsize i.practype i.region paymentindex_std incentivesindex_std , dummies(ownership grpsize practype region) outlier graph label(organization)

        I use dummy variables as well as standardized continuous variables as predictors.
        The code works without the graph option.

        Is there a way to list the outlier organizations in a table (not in a graphical way), as the command works without the graph option.
        Do you have any other ideas why I get this error message and how I can work around it.

        I did not work with complex sample designs and svy before, so any tip or reference would be very appreciated.

        Best,
        Alex

        Comment

        Working...
        X