Hi,
I’ve been having difficulties with imputation because I had to impute across different time horizons. Why? Because I didn’t want to impute variables for time periods in which they weren’t actually collected.
In my case, variables such as maternal education (edu_m), whether the mother lives in an intact family (intact_family_m), and the grandparents’ logged income and education are available from 1993 to 2019. However, the Big Five personality traits (bf_*) were only collected between 2005 and 2019.
I tried everything imaginable to incorporate the time component into the general Stata imputation command — unfortunately, always without success.
As a workaround, I split and saved the datasets in between steps. That is, I first imputed educ_m, intact_family_m, learn_gp, and educ_gp over the full time span to increase the number of observations to N=2015. Then, I saved the datasets (m=0 to m=20).
In the next step, I accessed those data and applied imputation to the Big Five variables, but only for the period 2005–2019.
Unfortunately, this approach prevents me from using the mi estimate command in Stata, and I’m now supposed to calculate Rubin’s variance manually.
Can someone help me with that? I would really appreciate any support!
Comment: There might be an additional problem with saving m=0 to m=20.... because when I use the saved mi-data set, I have no excess to the different imputed data sets since it isnt saved as a mi-data set. And to derive the Rubin Variance I need the excess to it...
Best,
Vera
My current Code:
*******************************
*mother sample*****************
*******************************
*imputation 1: educ_m learn_gp educ_gp intact_family_m**************************
use "$MY_OUT\data.dta", clear
misstable sum educ_m learn_gp educ_gp intact_family_m
*declaring the data to be mi data in mariginal long style (mlong)
mi set mlong
*registering variables
mi register imputed educ_m intact_family_m learn_gp educ_gp
mi register regular gpa female migrant num_sib north east west age_birth learn_m ///
only_child_m oldest_m num_sib_m age_11 age_12 age_13 age_14
mi impute chained (pmm, knn(5)) educ_m learn_gp educ_gp ///
(logit) intact_family_m = ///
gpa female migrant num_sib north east west ///
age_birth learn_m only_child_m oldest_m num_sib_m ///
age_11 age_12 age_13 age_14, ///
add(20) rseed(1234)
mi xeq 0 1 20: sum educ_m learn_gp educ_gp intact_family_m
*educ_gp does not increase constistently => make use of reordering
mi xeq: bysort pid (syear): replace educ_gp = educ_gp[_n-1] if educ_gp < educ_gp[_n-1] & !missing(educ_gp[_n-1]) & imputed_educ_gp == 1
*checking order => is working
bysort pid (syear): gen test = .
bysort pid (syear): replace test = 1 if educ_gp <= educ_gp[_n+1] | educ_gp == educ_gp[_n+1]
tab test
drop test
mi xeq: save "$MY_OUT\data_imputed_1_gpa", replace
*imputation 2: big five********************************************** ***********
use "$MY_OUT\data_imputed_1_gpa", clear
rename _mi_id _mi_id_im1
drop gpa_alt test_scores
misstable sum bf_open_m bf_consc_m bf_extra_m bf_agree_m bf_emostab_m if syear >= 2005
*declaring the data to be mi data in mariginal long style (mlong)
mi set mlong
*registering variables
mi register imputed bf_open_m bf_consc_m bf_extra_m bf_agree_m bf_emostab_m
mi register regular gpa female migrant num_sib north east south west ///
age_birth learn_m only_child_m oldest_m ///
num_sib_m age_11 age_12 age_13 age_14
mi impute chained (pmm, knn(5)) bf_open_m bf_consc_m bf_extra_m bf_agree_m bf_emostab_m = ///
gpa female migrant num_sib north east west ///
age_birth learn_m only_child_m ///
oldest_m num_sib_m age_11 age_12 age_13 age_14 ///
if syear >= 2005, add(20) rseed(1234)
*descriptive statistics
mi xeq 0 1 20: sum bf_open_m bf_consc_m bf_extra_m bf_agree_m bf_emostab_m
mi xeq: save "$MY_OUT\data_gpa", replace
I’ve been having difficulties with imputation because I had to impute across different time horizons. Why? Because I didn’t want to impute variables for time periods in which they weren’t actually collected.
In my case, variables such as maternal education (edu_m), whether the mother lives in an intact family (intact_family_m), and the grandparents’ logged income and education are available from 1993 to 2019. However, the Big Five personality traits (bf_*) were only collected between 2005 and 2019.
I tried everything imaginable to incorporate the time component into the general Stata imputation command — unfortunately, always without success.
As a workaround, I split and saved the datasets in between steps. That is, I first imputed educ_m, intact_family_m, learn_gp, and educ_gp over the full time span to increase the number of observations to N=2015. Then, I saved the datasets (m=0 to m=20).
In the next step, I accessed those data and applied imputation to the Big Five variables, but only for the period 2005–2019.
Unfortunately, this approach prevents me from using the mi estimate command in Stata, and I’m now supposed to calculate Rubin’s variance manually.
Can someone help me with that? I would really appreciate any support!
Comment: There might be an additional problem with saving m=0 to m=20.... because when I use the saved mi-data set, I have no excess to the different imputed data sets since it isnt saved as a mi-data set. And to derive the Rubin Variance I need the excess to it...
Best,
Vera
My current Code:
*******************************
*mother sample*****************
*******************************
*imputation 1: educ_m learn_gp educ_gp intact_family_m**************************
use "$MY_OUT\data.dta", clear
misstable sum educ_m learn_gp educ_gp intact_family_m
*declaring the data to be mi data in mariginal long style (mlong)
mi set mlong
*registering variables
mi register imputed educ_m intact_family_m learn_gp educ_gp
mi register regular gpa female migrant num_sib north east west age_birth learn_m ///
only_child_m oldest_m num_sib_m age_11 age_12 age_13 age_14
mi impute chained (pmm, knn(5)) educ_m learn_gp educ_gp ///
(logit) intact_family_m = ///
gpa female migrant num_sib north east west ///
age_birth learn_m only_child_m oldest_m num_sib_m ///
age_11 age_12 age_13 age_14, ///
add(20) rseed(1234)
mi xeq 0 1 20: sum educ_m learn_gp educ_gp intact_family_m
*educ_gp does not increase constistently => make use of reordering
mi xeq: bysort pid (syear): replace educ_gp = educ_gp[_n-1] if educ_gp < educ_gp[_n-1] & !missing(educ_gp[_n-1]) & imputed_educ_gp == 1
*checking order => is working
bysort pid (syear): gen test = .
bysort pid (syear): replace test = 1 if educ_gp <= educ_gp[_n+1] | educ_gp == educ_gp[_n+1]
tab test
drop test
mi xeq: save "$MY_OUT\data_imputed_1_gpa", replace
*imputation 2: big five********************************************** ***********
use "$MY_OUT\data_imputed_1_gpa", clear
rename _mi_id _mi_id_im1
drop gpa_alt test_scores
misstable sum bf_open_m bf_consc_m bf_extra_m bf_agree_m bf_emostab_m if syear >= 2005
*declaring the data to be mi data in mariginal long style (mlong)
mi set mlong
*registering variables
mi register imputed bf_open_m bf_consc_m bf_extra_m bf_agree_m bf_emostab_m
mi register regular gpa female migrant num_sib north east south west ///
age_birth learn_m only_child_m oldest_m ///
num_sib_m age_11 age_12 age_13 age_14
mi impute chained (pmm, knn(5)) bf_open_m bf_consc_m bf_extra_m bf_agree_m bf_emostab_m = ///
gpa female migrant num_sib north east west ///
age_birth learn_m only_child_m ///
oldest_m num_sib_m age_11 age_12 age_13 age_14 ///
if syear >= 2005, add(20) rseed(1234)
*descriptive statistics
mi xeq 0 1 20: sum bf_open_m bf_consc_m bf_extra_m bf_agree_m bf_emostab_m
mi xeq: save "$MY_OUT\data_gpa", replace
Comment