Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • estimation sample varies error with mi data

    I keep running into variations of the following error:

    estimation sample varies between m=1 and m=21; click here for details
    no results will be saved
    r(459);


    I have tried the following different codes, all of which seem to return the error:

    mi estimate: svy linearized: melogit fully i.income1 || course:

    mi estimate: svy linearized: melogit success i.fully##i.income1 || course:

    mi estimate: svy linearized: melogit success i.income1 || course: if fully==1

    The following model does not return this error:
    mi estimate: svy linearized: melogit success i.income1 || course: if fully==0

    The variables full and course are regular with no missing values. The variable income1 is imputed. I have this same problem with another imputed independent variable as well, but not with any of the other independent variables. I have read the text in the "click here for details" part of Stata (see below), but since my data are set to wide, I'm not sure how to proceed to diagnose the problem. I don't see how the issue could be the first option that this error mentions (see below), since fully is a regular variable and is not imputed--it does not vary across imputations. I have to admit that I don't really understand issues 2 and 3 listed below, and googling them for the last two days hasn't gotten me anywhere. I was hoping someone here might be able to help point me towards a way to diagnose the problem?



    Estimation sample varies across imputations

    There is something about the specified model that causes the estimation
    sample to be different between imputations. Here are several situations
    when this can happen:

    1. You are fitting a model on a subsample that changes from one
    imputation to another. For example, you specified the if expression
    containing imputed variables.

    2. Variables used by model-specific estimators contain values varying
    across imputations. This results in different sets of observations
    being used for completed-data analysis.

    3. Variables used in the model (specified directly or used indirectly by
    the estimator) contain missing values in sets of observations that
    vary among imputations. Verify that your mi data are proper and, if
    necessary, use mi update to update them.

    A varying estimation sample can lead to biased or less efficient
    estimates. We recommend that you evaluate the differences in records
    leading to a varying estimation sample before continuing your analysis.
    To identify the sets of observations varying across imputations, you can
    specify the esampvaryok option and save the estimation sample as an extra
    variable in your data (in the flong or flongsep styles only) by using mi
    estimate's esample() option.


  • #2
    Another possibility that comes up from somewhat frequently using multiple imputation for a logistic regression model is that in some of the imputed samples, you may end up with perfect prediction of your outcome by one (or more) of your variables. This leads to dropping of observations when the -melogit- is run on those samples, but not on others, and thence to the message you are getting. Typically in the other imputed samples you will have something near to perfect prediction by a variable, but not enough to cause Stata to react to it.

    Try re-running this with the -noisily- option of -mi estimate- so you can see if this is what's happening. Each affected run of -melogit- will give you a message indicating that it is dropping observations due to perfect prediction and will also describe the particulars of the perfect prediction itself.

    If this is the cause, then you may need to look into whether you should change your model in some way to avoid perfect predicitions: it usually entails eliminating a variable that has extremely strong associations with the outcome. If the offending variable(s) are indispensable to your model in theory and you cannot change the model and retain face validity, this might be a justification for using the -esampvaryok- option and accepting that limitation.

    Hope this helps.

    Comment


    • #3
      Thanks for your reply, Clyde. I spent some time following your recommendation and could find no instance of perfect prediction, and then finally I realized that I was confusing imputed and passive variables (and the passive variables had not be redefined after doing further imputation). I fixed the issue and now the models seem to be running fine. Thanks again for your help!

      Comment


      • #4
        Thanks for closing the thread!

        Comment


        • #5
          Hi,

          I came across this post because I am dealing with this same error. I imputed my data using the following command

          [/HTML] set seed 832016
          mi set wide
          mi register imputed infractions6 contact6 pwas1 pwas2 therapy3 therapy6 oyas
          mi impute chained (nbreg) infractions6 (regress) contact6 pwas1 pwas2 (logit) therapy3 therapy6(ologit) oyas = infractions3 contact3 pwas0 age miles length race exitr, add(15) force augment savetrace("trace", replace) [/HTML]

          and now I am trying to run some descriptives on the imputed data to see how it differs from the observed data using the following command,

          HTML Code:
          mi estimate: mean infractions6 if missing_infractions6==1
          mi estimate: mean contact6 if missing_contact6==1
          mi estimate: mean pwas1 if missing_pwas1==1
          mi estimate: mean pwas2 if missing_pwas2==1
          mi estimate: prop therapy3 if missing_therapy3==1
          mi estimate: prop therapy6 if missing_therapy6==1
          mi estimate: prop oyas if missing_oyas==1
          Prior to imputing the data, I flagged all the missing cases with this command,

          HTML Code:
          gen missing = missing(conduct_bin, harsh, sex, SEP, hardship, distress)
          
          foreach var of varlist infractions6 contact6 pwas1 pwas2 therapy3 therapy6 oyas {
              gen missing_`var' = missing(`var')
          }
          But every time I try running it I get the "estimation varies" error. I am still learning how to use STATA and the order of the commands. I am not sure how to incorporate noisily to the command to try to diagnose whether or not perfect prediction is what may be causing this problem. Any help you can provide is greatly appreciated.

          Comment

          Working...
          X