Mutiple imputation: An issue with the range of variables

Vera Schmidt

Join Date: Aug 2023

Posts: 15
#1

Mutiple imputation: An issue with the range of variables

10 Jun 2025, 10:24

Hi,

I am performing my first mutiple imputation in Stata. So far, I was able to write a general code and it works. But I need to adjust my results: the Big Five (bf_open_m bf_consc_m bf_extra_m bf_neuro_m bf_agree_m), which are defined to range between 1 and 7, exceed the maximum. Any advice how to handle this issue?

Code:
*declaring the data to be mi data in mariginal long style (mlong)
mi set mlong

*registering variables
mi register imputed bf_open_m bf_consc_m bf_extra_m bf_agree_m bf_emostab_m intact_family_m educ_m learn_gp educ_gp
mi register regular gpa female migrant num_sib north east south west poor_health_m poor_mental_m poor_pcs_p25_m poor_pcs_p10_m poor_mcs_p25_m poor_mcs_p10_m poor_health_dv_m poor_health_dv_p75_m poor_health_dv_p90_m age_birth learn_m only_child_m oldest_m num_sib_m

*imputation: big five
mi impute chained (regress) bf_open_m bf_consc_m bf_extra_m bf_agree_m bf_emostab_m = gpa female migrant num_sib north east west poor_health_m poor_mental_m poor_pcs_p25_m poor_pcs_p10_m poor_mcs_p25_m poor_mcs_p10_m poor_health_dv_m poor_health_dv_p75_m poor_health_dv_p90_m age_birth learn_m only_child_m oldest_m num_sib_m if syear >= 2005, add(20) rseed(1234)
*descriptive statistics
mi xeq 0 1 20: sum bf_open_m bf_consc_m bf_extra_m bf_agree_m bf_emostab_m

Results from last command:
m=0 data:
-> sum bf_open_m bf_consc_m bf_extra_m bf_agree_m bf_emostab_m

Variable Obs Mean Std. dev. Min Max

bf_open_m 546 4.749084 1.141884 2 7
bf_consc_m 909 5.986799 .9183887 3 7
bf_extra_m 908 4.204846 .7273964 2 7
bf_agree_m 900 3.858889 .6860106 2 6
bf_emostab_m 912 4.463816 .8025795 2 7

m=1 data:
-> sum bf_open_m bf_consc_m bf_extra_m bf_agree_m bf_emostab_m

Variable Obs Mean Std. dev. Min Max

bf_open_m 1,176 4.774927 1.152129 1.228258 8.220566
bf_consc_m 1,176 6.005426 .9323996 3 8.558269
bf_extra_m 1,176 4.204863 .7173529 2 7
bf_agree_m 1,176 3.859672 .7029196 1.179049 6
bf_emostab_m 1,176 4.486219 .8030745 2 7

m=20 data:
-> sum bf_open_m bf_consc_m bf_extra_m bf_agree_m bf_emostab_m

Variable Obs Mean Std. dev. Min Max

bf_open_m 1,176 4.798201 1.164106 1.221478 8.784744
bf_consc_m 1,176 6.014238 .9368743 3 8.917006
bf_extra_m 1,176 4.215043 .7343134 2 7
bf_agree_m 1,176 3.867506 .7041849 2 6
bf_emostab_m 1,176 4.480635 .807035 2 7

I am so happy for any advice! Thanks in advance.

Best,
Vera
Tags: imputation, mi estimate, mi xeq, missing values, multiple imputation
Rich Goldstein

Join Date: Mar 2014

Posts: 4462
#2

10 Jun 2025, 11:28

if I understand you correctly, then (1) outside the range is not a problem from statistical theory (though some of your readers may be bothered); (2) to keep within range, you can use pmm as the imputation method if your N is large enough; if you use pmm be sure your "knn(#)" is at least 5 and higher if you have N in the thousands
2 likes
Comment
Felix Bittmann

Join Date: Aug 2018

Posts: 691
#3

10 Jun 2025, 13:57

I would recommend to avoid the generation of values outside of the normal scale range by imputation. PMM, as Rich Goldstein suggests, is a very good choice. Another alternative is mi impute truncreg, where you can define the limits of the scale. Personally, I would prefer PMM tho.

Best wishes

Stata 18.0 MP | ORCID | Google Scholar
1 like
Comment
Vera Schmidt

Join Date: Aug 2023

Posts: 15
#4

11 Jun 2025, 08:05

Originally posted by Rich Goldstein View Post

if I understand you correctly, then (1) outside the range is not a problem from statistical theory (though some of your readers may be bothered); (2) to keep within range, you can use pmm as the imputation method if your N is large enough; if you use pmm be sure your "knn(#)" is at least 5 and higher if you have N in the thousands

Thank you so much!
Comment
Vera Schmidt

Join Date: Aug 2023

Posts: 15
#5

11 Jun 2025, 08:06

Originally posted by Felix Bittmann View Post

I would recommend to avoid the generation of values outside of the normal scale range by imputation. PMM, as Rich Goldstein suggests, is a very good choice. Another alternative is mi impute truncreg, where you can define the limits of the scale. Personally, I would prefer PMM tho.

Thank you so much!
Comment
Vera Schmidt

Join Date: Aug 2023

Posts: 15
#6

11 Jun 2025, 08:12

Another question arose: how can imputations with different time horizons be combined? The big five variables are conducted from 2005 onwards, the others are conducted from 1993 to 2019.

Thank you in advance!

Best,
Vera

Code:
*declaring the data to be mi data in mariginal long style (mlong)
mi set mlong

*registering variables
mi register imputed bf_open_m bf_consc_m bf_extra_m bf_agree_m bf_emostab_m intact_family_m educ_m learn_gp educ_gp
mi register regular gpa female migrant num_sib north east south west poor_health_m poor_mental_m poor_pcs_p25_m poor_pcs_p10_m poor_mcs_p25_m poor_mcs_p10_m poor_health_dv_m poor_health_dv_p75_m poor_health_dv_p90_m age_birth learn_m only_child_m oldest_m num_sib_m

mi impute chained (pmm, knn(5)) educ_m learn_gp educ_gp ///
(logit) intact_family_m = ///
gpa female migrant num_sib north east west ///
age_birth learn_m only_child_m oldest_m num_sib_m, ///
add(20) rseed(1234)
mi xeq 0 1 20: sum educ_m learn_gp educ_gp intact_family_m

*imputation: big five
mi impute chained (pmm, knn(5)) bf_open_m bf_consc_m bf_extra_m bf_agree_m bf_emostab_m = gpa female migrant num_sib north east west poor_health_m poor_mental_m poor_pcs_p25_m poor_pcs_p10_m poor_mcs_p25_m poor_mcs_p10_m poor_health_dv_m poor_health_dv_p75_m poor_health_dv_p90_m age_birth learn_m only_child_m oldest_m num_sib_m if syear >= 2005, add(20) rseed(1234)
*descriptive statistics
mi xeq 0 1 20: sum bf_open_m bf_consc_m bf_extra_m bf_agree_m bf_emostab_m

Last edited by Vera Schmidt; 11 Jun 2025, 08:16.
Comment

Announcement

Mutiple imputation: An issue with the range of variables

Comment

Comment

Comment

Comment

Comment