Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiple Imputation Results

    Hi Stata Users,

    First, let me apologize in case this may not be a Stata question in a strict manner. However, I believe I can still benefit from the vast knowledge of members of this group.
    I am performing multiple imputation using the code below
    Code:
    mi set wide
    mi register imputed pr_attend
    mi impute chained (regress) pr_attend, add(20) by(age)
    mi estimate: regress pr_attend hhsize hh_head_no_educ clust_literacy num_children hh_member_formal_empl dep_ratio hh_orphan i.hv024
    and attached dataset .
    I then perform some robustness checks and find whereas the mean of the imputed distribution is accurate (we know this since we have the population estimate!), Kolmogorov-Smirnov test of equality of distributions of original and imputed variables shows they are different. Visual exploration by use of kdensity function shows the same.
    Click image for larger version

Name:	eys.jpg
Views:	1
Size:	25.8 KB
ID:	1682199



    I notice that whereas the mean of the distributions are similar (dotted green line superimposed on the continuous red line), standard deviation seems a bit different and am wondering whether there’s a way to try and address this.

    Thanks in advance!
    Attached Files
    Last edited by Stephen Okiya; 16 Sep 2022, 02:25.

  • #2
    I am reluctant to open binary attachments. I can tell that this code

    Code:
    mi impute chained (regress) pr_attend, add(20) by(age)
    is most likely not what you want. Here, the imputed values depend only on age. Thus, you are badly underestimating the relationship between pr_attend and all other predictors, which probably explains the underestimated variance. You want to include all variables in our analysis model in the imputation model.

    Moreover, if you are using a linear model, then imputing only the outcome is not really necessary; if missing of the outcome depends only on the predictors, then the linear model remains consistent. You might lose a bit of power with complete case analyses but you are also likely to add unnecesary noise during imputation.

    Comment


    • #3
      Thanks daniel klein for the great insights. The reasons for performing imputation by age are:
      1. Conceptually, the estimates should be age specific
      2. Enrollment patterns differ across ages
      Is there a way I can perform imputations having this is in mind?

      Thanks in advance

      Comment


      • #4
        Ignoring my point about imputation probably being unnecessary here, there is nothing wrong with imputing by age. It is, however, unlikely that you want to impute only by age. What you probably want is something like

        Code:
        mi impute regress pr_attend hhsize hh_head_no_educ clust_literacy num_children hh_member_formal_empl dep_ratio hh_orphan i.hv024 , add(20) by(age)
        If the predictors also have missing values, you want chained equations (or a multivariate normal approach) and impute those missing values, too.

        Comment


        • #5
          daniel klein thanks a bunch for your insights. They are indeed helpful!!

          Comment


          • #6
            daniel klein Could you kindly guide me on how to implement chained equations (or a multivariate normal approach)?

            I believe that would resolve the error below
            Code:
            pr_attend: missing imputed values produced
                This may occur when imputation variables are used as independent variables or when independent variables contain missing values. You can
                specify option force if you wish to proceed anyway.
             -- above applies to age = 6
            Thanks in advance!

            Comment


            • #7
              In general I think the rule of thumb is that the imputation model should contain at least all variables of the analytical model, maybe even more (auxiliary variables). The error you receive usually happens if any variable in the imputation model contains extended missings (like .a .b) and so on since these values are never imputed, only the sysmiss (.) ones. I would check all variables carefully and either replace the extended with the sysmiss or remove the specific cases from the data.
              Best wishes

              (Stata 16.1 MP)

              Comment


              • #8
                Felix Bittmann Thanks so much! A closer look reveals there's an issue with my code. All explanatory variables shouldn't be missing since that information is available.

                Comment

                Working...
                X