Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Imputation

    Is it okay to impute missing data for independent variables? I have read that usually missing covariates are imputed.
    Also, please guide me on how to impute a categorical IV (3 categories- 0 (low), 1 (medium), 2 (robust)) in Stata?

  • #2
    There is good evidence now that all variable should be imputed including the outcomes. for the ordinal variable you can use ologit model

    Comment


    • #3
      From a mathematical and statistical perspective there is no difference between an independent variable and a covariate. The difference is only in whether we care directly about their effects or whether we include them in our models only to reduce their distortion of the effects of the variables we do care about. So yes, it is as OK to impute an independent variable as it is to impute any covariate.

      With a three-level covariate, if you are using multiple imputation with chained equations, you would probably use a multinomial logistic model to impute this one.

      Added: Crossed with #2. If the three categories, low, medium, and robust, can be considered as ordered, then, yes, it would be better to use an ordinal logistic model to impute it--these converge much more easily than multinomial logistics. But if they aren't really ordered (I don't know how "robust" relates to "low" and "medium" without more context) than you can't use ordinal logistic modeling.
      Last edited by Clyde Schechter; 24 Aug 2018, 17:01.

      Comment


      • #4
        I did an ordinal logistic model to impute it.
        mi impute ologit ccscorecatnew1 pos_mgmt bedcode TEACH, add(20) rseed(1234)

        where ccscorecatnew1 is the cultural competency score that needs to be imputed, pos_mgmt is positive score for management support which is otherwise the outcome variable in the main model, bedcode is hospital size, TEACH is teaching status. ccscorecatnew1 is ordered in 3 categories: 1 robust, 2 medium, and 3 low. There were 77 out of 283 missing values for ccscorecatnew1.

        The output shows that 77 values were incomplete and were imputed. However, when I do a tab, I see the total as 206 instead of 283.
        Why are results from tab not showing the total of 283? I went to the data editor as well and there are missing values still present despite the output window showing that they have been imputed.

        Comment


        • #5
          Maybe you could present what you typed afterwards. You are supposed to get the imputed values for the regression analysis, provided the commands were typed accordingly.
          Best regards,

          Marcos

          Comment


          • #6
            Ok, so it wouldn't show in the actual dataset under 'data editor'?

            Comment


            • #7
              In my dataset the main IV, cultural competency score has missing data. Other control variables such as organization's location, organization's ownership status also have missing data. The DV doesn't have any missing data. When I perform multiple imputation, I do it only on my main IV-cultural competency score. Since it has rank ordered categories (1,2,3), I performed an ordinal logstics regression and used my main model's DV as an IV, and some other control variables that are non-missing. I was able to impute my main IV, cultural competency score successfully. Then, I ran my main regression model. Since some other control variables are missing, the sample size reduced again.

              Given the above issue, how can I impute all of the variables that are missing ? I am using the multiple imputation control panel under statistics and I followed Stats's multiple imputation video on Youtube. Thank you for your help.

              Comment


              • #8


                I was able to impute my main IV, cultural competency score successfully. Then, I ran my main regression model. Since some other control variables are missing, the sample size reduced again.
                I believe you should perform the multiple imputation for all missing variables (needed to the model) at once.
                Best regards,

                Marcos

                Comment


                • #9
                  Originally posted by Soumya Upadhyay View Post
                  In my dataset the main IV, cultural competency score has missing data. Other control variables such as organization's location, organization's ownership status also have missing data. The DV doesn't have any missing data. When I perform multiple imputation, I do it only on my main IV-cultural competency score. Since it has rank ordered categories (1,2,3), I performed an ordinal logstics regression and used my main model's DV as an IV, and some other control variables that are non-missing. I was able to impute my main IV, cultural competency score successfully. Then, I ran my main regression model. Since some other control variables are missing, the sample size reduced again.

                  Given the above issue, how can I impute all of the variables that are missing ? I am using the multiple imputation control panel under statistics and I followed Stats's multiple imputation video on Youtube. Thank you for your help.
                  What Marcos gave you is correct. You should also note that you can actually use multiple imputation models simultaneously. For example, the code below treats beds as continuous and teaching status as binary, and it adds the independent variables x1-x3 that I assume are complete:

                  Code:
                  mi impute (ologit) ccscorecatnew1 pos_mgmt (regress) beds (logit) TEACH = x1 x2 x3, add(20) rseed(1234)
                  Best practice is that you want the imputation model to have all the variables in your final model.
                  Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                  When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                  Comment

                  Working...
                  X