Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mi ; multiple imputation; handling with missing values

    Hello everyone, I am working with multiple imputations in Stata. I have a continuous variable (Segr_v) where I would like to replace some missing values by estimating them through a regression that uses these predictors (ATECO_2digit NUMERO_COMPLESSIVO SEDE_PROVINCIA_label F_share).

    The command I am using is the following:

    Code:
    mi set wide
    mi register imputed Segr_v
    mi impute regress Segr_v ATECO_2digit NUMERO_COMPLESSIVO SEDE_PROVINCIA_label F_share if TODROP_overall_final != 1  & NACF != 1 & SizeOver50 == 1 & duplicates_drop != 1 & ImpresaFEM != 1 & ImpresaM!= 1 , add(1) rseed(1234)

    The issue is that the code returns values ranging from -0.76 to 1.18, while the logical range for my variable is [0;1]. Can the command be adjusted to consider this, or do you recommend replacing the excesses with lower and upper bounds? Additionally, could you better explain "add(1)"? Currently, it adds an extra variable (because I have set "wise"), but what would be the utility of including an upper value (eg: add(20))?


    Moreover, I am not familiar with strategies for imputing missing values, do you have further suggestions or alternative codes for reaching my goal?

    Many thanks in advance for your time.
    Wishing you a great weak ahead
    Last edited by Chiara Tasselli; 14 Dec 2023, 05:58.

  • #2
    regress will do this almost always - theoretically this is not a problem but, like you, many people object - use "pmm" instead of regress; see
    Code:
    h mi impute

    Comment


    • #3
      Originally posted by Rich Goldstein View Post
      regress will do this almost always - theoretically this is not a problem but, like you, many people object - use "pmm" instead of regress; see
      Code:
      h mi impute
      Thank you very much, I just tried it, and the results seem much more reasonable.
      once again, many thanks for your help

      Comment


      • #4
        glad it worked out but note that there can be problems with pmm - in particular, some examples may be "chosen" too often and to help guard against this you should make sure that the number of "nearest neighbors" being drawn from is at least 5 (and 10, as a minimum) may well be better (here, I am referring to the number you placed in the "knn(#)" option)

        Comment


        • #5
          Originally posted by Rich Goldstein View Post
          glad it worked out but note that there can be problems with pmm - in particular, some examples may be "chosen" too often and to help guard against this you should make sure that the number of "nearest neighbors" being drawn from is at least 5 (and 10, as a minimum) may well be better (here, I am referring to the number you placed in the "knn(#)" option)
          Once again many thanks for your excellent suggestions.

          Comment

          Working...
          X