Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Missing data imputation problem with syntax

    Hi everyone,

    I am trying to run a missing data multiple imputation procedure. However, I get the following message:

    "mi impute mlogit B5 = DV1_Successor_Party A5 B4 A3 Gender Age_Groups EmployedDummy Household_Income Residence Region Language N_Scale, add(10) rseed(1234)

    mi impute mlogit: perfect predictor(s) detected Variables that perfectly predict an outcome were detected when mlogit executed on the observed data. First, specify mi impute's option noisily to identify the problem covariates. Then either remove perfect predictors from the model or specify mi impute mlogit's option augment to perform augmented regression; see The issue of perfect prediction during imputation of categorical data in [MI] mi impute for details. r(498);"

    So, I want to choose the first route and specify noisily in order to identify the problematic covariates. Next, I insert noisily in the syntax command, but it is not working. I have inserted noisily in all the possible places in the syntax command, but it is returning variations of the message as above. What is the correct way to formulate the syntax in this case?

    Thank you,
    Ion

  • #2
    Welcome to Statalist, Ion.

    I assume you have added the noisily option somewhere after the comma in your mi impute command.

    Is it the case that the error message always immediately follows the mi impute command? I expect that with the "noisily" option specified the mi impute command will display the results of the mlogit command run on the observed data. And then it will once again display the error message and halt. So getting the same error message is not a surprise, only if mi impute fails to give additional output am I surprised.

    From the mlogit output you are supposed to be able to identify the problem covariates, and then either fix your data or change your model specification, the rerun the mi impute command with the corrected data or model specifications, at which point the error message will no longer occur.

    If this doesn't address your question, please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question. The more you help others understand your problem, the more likely others are to be able to help you solve your problem.

    Section 12.1 is particularly pertinent

    12.1 What to say about your commands and your problem

    Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!
    It would be useful if you were to copy, from your results window, the mi impute command, with the noisily option, and all its output up to and including the error message, and then paste that into your Statalist post using CODE delimiters as described in the FAQ. For example, the following:

    [code]
    . sysuse auto, clear
    (1978 Automobile Data)

    . describe make price

    storage display value
    variable name type format label variable label
    -----------------------------------------------------------------
    make str18 %-18s Make and Model
    price int %8.0gc Price
    [/code]

    will be presented in the post as the following:
    Code:
    . sysuse auto, clear
    (1978 Automobile Data)
    
    . describe make price
    
                  storage   display    value
    variable name   type    format     label      variable label
    -----------------------------------------------------------------
    make            str18   %-18s                 Make and Model
    price           int     %8.0gc                Price

    Comment


    • #3
      Thank you for your well-thought response, William. Indeed I was wrongly assuming that much like the 'quietly regress' syntax, the 'noisily' part should be placed somewhere at the beginning of the syntax. I followed your advice and it worked well. I ended up removing two problematic covariates and the imputation procedure unfolded smoothly.

      Still, I encountered another issue. It seems that after running the following syntax:

      'mi estimate, ni (9) dots or: logit Wish1 A3 A5 B4 B5 Gender ib1.Age_Groups ib1.Education EmployedDummy ib1.Household_Income Residence ib1.Region Language [iweight=WEIGHT_FINAL]'

      The output produced by Stata 15, when running 'mi estimate: logit' does not display the pseudo-R2 as it would if the model would have been run on the observed data. Are you aware of any way to have the pseudo-R2 in the output? Is it even possible to compute it in STATA using mi estimate?

      Click image for larger version

Name:	Untitled.jpg
Views:	1
Size:	131.3 KB
ID:	1400431


      Comment


      • #4
        From a more theoretical perspective it seems that you are using a univariate imputation model for your response/outcome/dependend variable. This is usually not a good idea unless the imputation model has more information (i.e. variables) than the model to be used for analyses.

        Judging from the very little information I gather from the cryptic variable names you may have missing values on covariates, e.g. houshold_income, too. If this is the case, you want a multivariate imputation approach that imputes values in all of your variables.

        Edit:

        From the output you show, I am even more sure that your imputation approach might not be quite correct. This is because the imputed variables name before does not even appear in the model which has a binary outcome. Note that you can usually not impute different variables with a series of univariate imputation models.

        Pseudo R-squares are not likely to be appropriate candidates for combination with Rubin rules. You may be interested in this anyway.

        Edit 2:

        Using iweights is usually not a very good idea, either, except you know what you are doing. These will usually not give you correct variance estimates and since MI is essentially about getting the correct variances, it seems your approach is a bit inconsistent here, too.

        Edit 3:

        Please do not use screenshots, use code delimiters as William has already explained.

        Best
        Daniel
        Last edited by daniel klein; 04 Jul 2017, 23:32.

        Comment


        • #5
          A comment on the mi impute syntax: the "noisily" option on mi impute is distinct from the identically-named "noisily" command. Stata tried to hint in that direction by referring to the "noisily option" in the error message; a glance at help mi impute shows the details. The "quietly" and "noisily" commands (or command prefixes as I tend to think of them) are most definitely not command options - although of course their use is optional.

          Comment


          • #6
            Daniel, thanks, I see your points. The imputation model (post 1) and the estimated model (in post 2) are parts of two distinct models with two different outcomes. Posting them here under the same discussion thread might have caused some confusion regarding what was imputed and how was it used during the estimation stage.

            Hence,
            From the output you show, I am even more sure that your imputation approach might not be quite correct. This is because the imputed variables name before does not even appear in the model which has a binary outcome. Note that you can usually not impute different variables with a series of univariate imputation models.
            Regarding Edit 1: Indeed, I used a multivariate imputation model at a latter stage. On a related note, it seems that imputing missing values for the dependent variable is not generally recommended. In all, I imputed the missing values for all the variables, except for the outcome variables (DV1 in post#1 and Regret1 in post#2). Thanks for the link on point estimates.

            Regarding Edit 2 on using weights when doing mi estimate logit: still, not sure what your suggestion is. Should I refrain from using the weights at this stage of the analysis?

            Re Edit 3: Will do, once I learn more about it.

            @William, got it.

            Comment


            • #7
              Originally posted by Ion Marandici View Post
              Regarding Edit 2 on using weights when doing mi estimate logit: still, not sure what your suggestion is. Should I refrain from using the weights at this stage of the analysis?
              I was not questioning the use of weights in general, but the use of iweights. You probably want pweights instead.

              Best
              Daniel

              Comment


              • #8
                Sure. It's just that I was running into the 'non-integer problem' and was experimenting much like those posting here: https://www.stata.com/statalist/arch.../msg00417.html
                The examples above are just initial attempts to experiment with SEM and should not be taken too seriously. So far I did not come up with a good SE Model. Thanks again.

                Comment


                • #9
                  There is no such thing as a "non.integer problem". The only weights that are supposed to be integers are frequency weights which makes perfect sense. Which weights to use has little to do with experimenting and I warn against ending using the type of weights that happens to be supported by the command. You need to know which weights you want in advance. Start with help weight.

                  Best
                  Daniel

                  Comment


                  • #10
                    I might be wrong, but I suppose that in theory there might exist 'a non-integer problem'. For instance, when one needs to use the fweights, but the frequency weights provided in a dataset are non-integer frequency weights and STATA refuses to execute the command. What would be the solution in that case?

                    Comment


                    • #11
                      In that case the provided weights are not frequency weights in sense they are defined in Stata. You need to find out the providers definition of the weights then use the appropriate weight type in Stata.

                      Unfortunately not all data providers (nor all statistical software packages) are aware of the differences between the types of weights and they are not usually very specific about the weights they use.

                      Best
                      Daniel

                      Comment


                      • #12
                        Thanks!

                        Comment

                        Working...
                        X