Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mutiple imputation: An issue with the range of variables

    Hi,

    I am performing my first mutiple imputation in Stata. So far, I was able to write a general code and it works. But I need to adjust my results: the Big Five (bf_open_m bf_consc_m bf_extra_m bf_neuro_m bf_agree_m), which are defined to range between 1 and 7, exceed the maximum. Any advice how to handle this issue?

    Code:
    *declaring the data to be mi data in mariginal long style (mlong)
    mi set mlong

    *registering variables
    mi register imputed bf_open_m bf_consc_m bf_extra_m bf_agree_m bf_emostab_m intact_family_m educ_m learn_gp educ_gp
    mi register regular gpa female migrant num_sib north east south west poor_health_m poor_mental_m poor_pcs_p25_m poor_pcs_p10_m poor_mcs_p25_m poor_mcs_p10_m poor_health_dv_m poor_health_dv_p75_m poor_health_dv_p90_m age_birth learn_m only_child_m oldest_m num_sib_m

    *imputation: big five
    mi impute chained (regress) bf_open_m bf_consc_m bf_extra_m bf_agree_m bf_emostab_m = gpa female migrant num_sib north east west poor_health_m poor_mental_m poor_pcs_p25_m poor_pcs_p10_m poor_mcs_p25_m poor_mcs_p10_m poor_health_dv_m poor_health_dv_p75_m poor_health_dv_p90_m age_birth learn_m only_child_m oldest_m num_sib_m if syear >= 2005, add(20) rseed(1234)
    *descriptive statistics
    mi xeq 0 1 20: sum bf_open_m bf_consc_m bf_extra_m bf_agree_m bf_emostab_m

    Results from last command:
    m=0 data:
    -> sum bf_open_m bf_consc_m bf_extra_m bf_agree_m bf_emostab_m

    Variable Obs Mean Std. dev. Min Max

    bf_open_m 546 4.749084 1.141884 2 7
    bf_consc_m 909 5.986799 .9183887 3 7
    bf_extra_m 908 4.204846 .7273964 2 7
    bf_agree_m 900 3.858889 .6860106 2 6
    bf_emostab_m 912 4.463816 .8025795 2 7

    m=1 data:
    -> sum bf_open_m bf_consc_m bf_extra_m bf_agree_m bf_emostab_m

    Variable Obs Mean Std. dev. Min Max

    bf_open_m 1,176 4.774927 1.152129 1.228258 8.220566
    bf_consc_m 1,176 6.005426 .9323996 3 8.558269
    bf_extra_m 1,176 4.204863 .7173529 2 7
    bf_agree_m 1,176 3.859672 .7029196 1.179049 6
    bf_emostab_m 1,176 4.486219 .8030745 2 7

    m=20 data:
    -> sum bf_open_m bf_consc_m bf_extra_m bf_agree_m bf_emostab_m

    Variable Obs Mean Std. dev. Min Max

    bf_open_m 1,176 4.798201 1.164106 1.221478 8.784744
    bf_consc_m 1,176 6.014238 .9368743 3 8.917006
    bf_extra_m 1,176 4.215043 .7343134 2 7
    bf_agree_m 1,176 3.867506 .7041849 2 6
    bf_emostab_m 1,176 4.480635 .807035 2 7


    I am so happy for any advice! Thanks in advance.

    Best,
    Vera

  • #2
    if I understand you correctly, then (1) outside the range is not a problem from statistical theory (though some of your readers may be bothered); (2) to keep within range, you can use pmm as the imputation method if your N is large enough; if you use pmm be sure your "knn(#)" is at least 5 and higher if you have N in the thousands

    Comment


    • #3
      I would recommend to avoid the generation of values outside of the normal scale range by imputation. PMM, as Rich Goldstein suggests, is a very good choice. Another alternative is mi impute truncreg, where you can define the limits of the scale. Personally, I would prefer PMM tho.
      Best wishes

      Stata 18.0 MP | ORCID | Google Scholar

      Comment


      • #4
        Originally posted by Rich Goldstein View Post
        if I understand you correctly, then (1) outside the range is not a problem from statistical theory (though some of your readers may be bothered); (2) to keep within range, you can use pmm as the imputation method if your N is large enough; if you use pmm be sure your "knn(#)" is at least 5 and higher if you have N in the thousands
        Thank you so much!

        Comment


        • #5
          Originally posted by Felix Bittmann View Post
          I would recommend to avoid the generation of values outside of the normal scale range by imputation. PMM, as Rich Goldstein suggests, is a very good choice. Another alternative is mi impute truncreg, where you can define the limits of the scale. Personally, I would prefer PMM tho.
          Thank you so much!

          Comment


          • #6
            Another question arose: how can imputations with different time horizons be combined? The big five variables are conducted from 2005 onwards, the others are conducted from 1993 to 2019.

            Thank you in advance!

            Best,
            Vera

            Code:
            *declaring the data to be mi data in mariginal long style (mlong)
            mi set mlong

            *registering variables
            mi register imputed bf_open_m bf_consc_m bf_extra_m bf_agree_m bf_emostab_m intact_family_m educ_m learn_gp educ_gp
            mi register regular gpa female migrant num_sib north east south west poor_health_m poor_mental_m poor_pcs_p25_m poor_pcs_p10_m poor_mcs_p25_m poor_mcs_p10_m poor_health_dv_m poor_health_dv_p75_m poor_health_dv_p90_m age_birth learn_m only_child_m oldest_m num_sib_m

            mi impute chained (pmm, knn(5)) educ_m learn_gp educ_gp ///
            (logit) intact_family_m = ///
            gpa female migrant num_sib north east west ///
            age_birth learn_m only_child_m oldest_m num_sib_m, ///
            add(20) rseed(1234)
            mi xeq 0 1 20: sum educ_m learn_gp educ_gp intact_family_m

            *imputation: big five
            mi impute chained (pmm, knn(5)) bf_open_m bf_consc_m bf_extra_m bf_agree_m bf_emostab_m = gpa female migrant num_sib north east west poor_health_m poor_mental_m poor_pcs_p25_m poor_pcs_p10_m poor_mcs_p25_m poor_mcs_p10_m poor_health_dv_m poor_health_dv_p75_m poor_health_dv_p90_m age_birth learn_m only_child_m oldest_m num_sib_m if syear >= 2005, add(20) rseed(1234)
            *descriptive statistics
            mi xeq 0 1 20: sum bf_open_m bf_consc_m bf_extra_m bf_agree_m bf_emostab_m
            Last edited by Vera Schmidt; 11 Jun 2025, 08:16.

            Comment

            Working...
            X