Rubins Variance (Mutiple Imputation)

Vera Schmidt

Join Date: Aug 2023

Posts: 21
#1

Rubins Variance (Mutiple Imputation)

06 Jul 2025, 08:25

Hi,

I’ve been having difficulties with imputation because I had to impute across different time horizons. Why? Because I didn’t want to impute variables for time periods in which they weren’t actually collected.

In my case, variables such as maternal education (edu_m), whether the mother lives in an intact family (intact_family_m), and the grandparents’ logged income and education are available from 1993 to 2019. However, the Big Five personality traits (bf_*) were only collected between 2005 and 2019.

I tried everything imaginable to incorporate the time component into the general Stata imputation command — unfortunately, always without success.

As a workaround, I split and saved the datasets in between steps. That is, I first imputed educ_m, intact_family_m, learn_gp, and educ_gp over the full time span to increase the number of observations to N=2015. Then, I saved the datasets (m=0 to m=20).

In the next step, I accessed those data and applied imputation to the Big Five variables, but only for the period 2005–2019.

Unfortunately, this approach prevents me from using the mi estimate command in Stata, and I’m now supposed to calculate Rubin’s variance manually.

Can someone help me with that? I would really appreciate any support!

Comment: There might be an additional problem with saving m=0 to m=20.... because when I use the saved mi-data set, I have no excess to the different imputed data sets since it isnt saved as a mi-data set. And to derive the Rubin Variance I need the excess to it...

Best,
Vera

My current Code:

*******************************
*mother sample*****************
*******************************

*imputation 1: educ_m learn_gp educ_gp intact_family_m**************************
use "$MY_OUT\data.dta", clear
misstable sum educ_m learn_gp educ_gp intact_family_m

*declaring the data to be mi data in mariginal long style (mlong)
mi set mlong

*registering variables
mi register imputed educ_m intact_family_m learn_gp educ_gp
mi register regular gpa female migrant num_sib north east west age_birth learn_m ///
only_child_m oldest_m num_sib_m age_11 age_12 age_13 age_14

mi impute chained (pmm, knn(5)) educ_m learn_gp educ_gp ///
(logit) intact_family_m = ///
gpa female migrant num_sib north east west ///
age_birth learn_m only_child_m oldest_m num_sib_m ///
age_11 age_12 age_13 age_14, ///
add(20) rseed(1234)
mi xeq 0 1 20: sum educ_m learn_gp educ_gp intact_family_m

*educ_gp does not increase constistently => make use of reordering
mi xeq: bysort pid (syear): replace educ_gp = educ_gp[_n-1] if educ_gp < educ_gp[_n-1] & !missing(educ_gp[_n-1]) & imputed_educ_gp == 1
*checking order => is working
bysort pid (syear): gen test = .
bysort pid (syear): replace test = 1 if educ_gp <= educ_gp[_n+1] | educ_gp == educ_gp[_n+1]
tab test
drop test

mi xeq: save "$MY_OUT\data_imputed_1_gpa", replace

*imputation 2: big five********************************************** ***********
use "$MY_OUT\data_imputed_1_gpa", clear
rename _mi_id _mi_id_im1
drop gpa_alt test_scores
misstable sum bf_open_m bf_consc_m bf_extra_m bf_agree_m bf_emostab_m if syear >= 2005

*declaring the data to be mi data in mariginal long style (mlong)
mi set mlong

*registering variables
mi register imputed bf_open_m bf_consc_m bf_extra_m bf_agree_m bf_emostab_m
mi register regular gpa female migrant num_sib north east south west ///
age_birth learn_m only_child_m oldest_m ///
num_sib_m age_11 age_12 age_13 age_14

mi impute chained (pmm, knn(5)) bf_open_m bf_consc_m bf_extra_m bf_agree_m bf_emostab_m = ///
gpa female migrant num_sib north east west ///
age_birth learn_m only_child_m ///
oldest_m num_sib_m age_11 age_12 age_13 age_14 ///
if syear >= 2005, add(20) rseed(1234)
*descriptive statistics
mi xeq 0 1 20: sum bf_open_m bf_consc_m bf_extra_m bf_agree_m bf_emostab_m
mi xeq: save "$MY_OUT\data_gpa", replace
Tags: manuell coding, mi estimate, missing, mutiple imputation, rubins variance

Felix Bittmann

Join Date: Aug 2018
Posts: 727

06 Jul 2025, 09:25

I dont think your approach is ideal and I dont understand how different time "horizons" are a problem. As soon as you reshape to wide, you only select the years for imputation that are relevant. Let me give an example:

Code:

clear all
***Creating example data ***
set seed 123
webuse nlswork
xtset, clear
keep idcode year race grade south ln_wage
keep if year > 79
replace ln_wage = . if year < 85
tabstat grade south ln_wage, by(year) stats(mean N)
foreach VAR of varlist race grade south ln_wage {
    replace `VAR' = . if runiform() < 0.05    //Creating missing values to impute
}


*** Imputing data
reshape wide race grade south ln_wage, i(idcode) j(year)


mi set flong
mi register imputed grade* south* ln_wage85 ln_wage87 ln_wage88
mi impute chained (pmm, knn(5)) grade* south* ln_wage85 ln_wage87 ln_wage88 ///
    , add(3) rseed(123) dots
    
    
mi reshape long grade south ln_wage, i(idcode) j(year)
tabstat grade south ln_wage, by(year) stats(mean N)


*** Analysis ***
mi xtset idcode year
mi estimate: xtreg ln_wage grade south, fe

Here, grade and south are observed from 1980 to 1988, ln_wage is only observed from 1985 onwards. The tabstat command clearly shows which data is available. The clue is to convert to wide and only select the relevant years in the imputation command. That is, only ln_wage85 ln_wage87 ln_wage88 are added, the other years are simply left out. After the imputation, the data are shaped back to long. Missing data are accounted for but only in the correct years.

Best wishes

Stata 18.0 MP | ORCID | Google Scholar

Comment

Vera Schmidt

Join Date: Aug 2023

Posts: 21
#3

07 Jul 2025, 06:40

Dear Felix,

thank you so much! Now, I get what you mean by using the wide format... I tried to copy your example to my data. Did not work well, may you can have a look an give some feedback?

use "$MY_OUT\data.dta", clear
tabstat gpa educ_m intact_family_m learn_gp educ_gp bf_open_m bf_consc_m bf_extra_m bf_agree_m bf_emostab_m, by(syear) stats(mean N)

Summary statistics: Mean, N
Group variable: syear (Survey Year)

syear: 1993-2016

gpa: 1993-2016 (total: 2015) => no missings
educ_m: 1993-2016 (total: 1990)
intact~m: 1995-2016 (total: 1680)
learn_gp: 1995-2016 (total: 1570)
educ_gp: 1995-2016 (total: 1680)
bf_ope~m: 2008-2016 (total: 546)
bf_con~m: 2005-2016 (total: 909)
bf_ext~m: 2005-2016 (total: 908)
bf_agr~m: 2005-2016 (total: 900)
bf_emo~m: 2005-2016 (total: 912)

reshape wide *all variables*, i(pid) j(syear)

mi set flong
mi register imputed educ_m* intact_family_m2016 intact_family_m2015 intact_family_m2014 intact_family_m2013 intact_family_m2012 intact_family_m2011 intact_family_m2010 intact_family_m2009 intact_family_m2008 intact_family_m2007 intact_family_m2006 intact_family_m2005 intact_family_m2004 intact_family_m2003 intact_family_m2002 intact_family_m2001 intact_family_m2000 intact_family_m1999 intact_family_m1998 intact_family_m1997 intact_family_m1996 intact_family_m1995 learn_gp2016 learn_gp2015 learn_gp2014 learn_gp2013 learn_gp2012 learn_gp2011 learn_gp2010 learn_gp2009 learn_gp2008 learn_gp2007 learn_gp2006 learn_gp2005 learn_gp2004 learn_gp2003 learn_gp2002 learn_gp2001 learn_gp2000 learn_gp1999 learn_gp1998 learn_gp1997 learn_gp1996 learn_gp1995 educ_gp1995 educ_gp1996 educ_gp1997 educ_gp1998 educ_gp1999 educ_gp2000 educ_gp2001 educ_gp2002 educ_gp2003 educ_gp2004 educ_gp2005 educ_gp2006 educ_gp2007 educ_gp2008 educ_gp2009 educ_gp2010 educ_gp2011 educ_gp2012 educ_gp2013 educ_gp2014 educ_gp2015 educ_gp2016 bf_open_m2008 bf_open_m2009 bf_open_m2010 bf_open_m2011 bf_open_m2012 bf_open_m2013 bf_open_m2014 bf_open_m2015 bf_open_m2016 bf_consc_m2005 bf_consc_m2006 bf_consc_m2007 bf_consc_m2008 bf_consc_m2009 bf_consc_m2010 bf_consc_m2011 bf_consc_m2012 bf_consc_m2013 bf_consc_m2014 bf_consc_m2015 bf_consc_m2016 bf_extra_m2016 bf_extra_m2015 bf_extra_m2014 bf_extra_m2013 bf_extra_m2012 bf_extra_m2011 bf_extra_m2010 bf_extra_m2009 bf_extra_m2008 bf_extra_m2007 bf_extra_m2006 bf_extra_m2005 bf_agree_m2005 bf_agree_m2006 bf_agree_m2007 bf_agree_m2008 bf_agree_m2009 bf_agree_m2010 bf_agree_m2011 bf_agree_m2012 bf_agree_m2013 bf_agree_m2014 bf_agree_m2015 bf_agree_m2016 bf_emostab_m2005 bf_emostab_m2006 bf_emostab_m2007 bf_emostab_m2008 bf_emostab_m2009 bf_emostab_m2010 bf_emostab_m2011 bf_emostab_m2012 bf_emostab_m2013 bf_emostab_m2014 bf_emostab_m2015 bf_emostab_m2016

mi impute chained (logit) intact_family_m2016 intact_family_m2015 intact_family_m2014 intact_family_m2013 intact_family_m2012 intact_family_m2011 intact_family_m2010 intact_family_m2009 intact_family_m2008 intact_family_m2007 intact_family_m2006 intact_family_m2005 intact_family_m2004 intact_family_m2003 intact_family_m2002 intact_family_m2001 intact_family_m2000 intact_family_m1999 intact_family_m1998 intact_family_m1997 intact_family_m1996 intact_family_m1995 (pmm, knn(5)) educ_m* learn_gp2016 learn_gp2015 learn_gp2014 learn_gp2013 learn_gp2012 learn_gp2011 learn_gp2010 learn_gp2009 learn_gp2008 learn_gp2007 learn_gp2006 learn_gp2005 learn_gp2004 learn_gp2003 learn_gp2002 learn_gp2001 learn_gp2000 learn_gp1999 learn_gp1998 learn_gp1997 learn_gp1996 learn_gp1995 educ_gp1995 educ_gp1996 educ_gp1997 educ_gp1998 educ_gp1999 educ_gp2000 educ_gp2001 educ_gp2002 educ_gp2003 educ_gp2004 educ_gp2005 educ_gp2006 educ_gp2007 educ_gp2008 educ_gp2009 educ_gp2010 educ_gp2011 educ_gp2012 educ_gp2013 educ_gp2014 educ_gp2015 educ_gp2016 bf_open_m2008 bf_open_m2009 bf_open_m2010 bf_open_m2011 bf_open_m2012 bf_open_m2013 bf_open_m2014 bf_open_m2015 bf_open_m2016 bf_consc_m2005 bf_consc_m2006 bf_consc_m2007 bf_consc_m2008 bf_consc_m2009 bf_consc_m2010 bf_consc_m2011 bf_consc_m2012 bf_consc_m2013 bf_consc_m2014 bf_consc_m2015 bf_consc_m2016 bf_extra_m2016 bf_extra_m2015 bf_extra_m2014 bf_extra_m2013 bf_extra_m2012 bf_extra_m2011 bf_extra_m2010 bf_extra_m2009 bf_extra_m2008 bf_extra_m2007 bf_extra_m2006 bf_extra_m2005 bf_agree_m2005 bf_agree_m2006 bf_agree_m2007 bf_agree_m2008 bf_agree_m2009 bf_agree_m2010 bf_agree_m2011 bf_agree_m2012 bf_agree_m2013 bf_agree_m2014 bf_agree_m2015 bf_agree_m2016 bf_emostab_m2005 bf_emostab_m2006 bf_emostab_m2007 bf_emostab_m2008 bf_emostab_m2009 bf_emostab_m2010 bf_emostab_m2011 bf_emostab_m2012 bf_emostab_m2013 bf_emostab_m2014 bf_emostab_m2015 bf_emostab_m2016, add(3) rseed(123) dots

Performing chained iterations:
imputing m=1 through m=3 mi impute: VCE is not positive definite
The posterior distribution from which mi impute drew the imputations for intact_family_m2005 is not proper when the VCE estimated from the observed data is
not positive definite. This may happen, for example, when the number of parameters exceeds the number of observations. Choose an alternate imputation
model.
error occurred during imputation of intact_family_m2016 intact_family_m2015 intact_family_m2014 intact_family_m2013 intact_family_m2012 intact_family_m2011
intact_family_m2010 intact_family_m2009 intact_family_m2008 intact_family_m2007 intact_family_m2006 intact_family_m2005 intact_family_m2004 intact_family_m2003
intact_family_m2002 intact_family_m2001 intact_family_m2000 intact_family_m1999 intact_family_m1998 intact_family_m1997 intact_family_m1996 intact_family_m1995
educ_m1993 educ_m1994 educ_m1995 educ_m1996 educ_m1997 educ_m1998 educ_m1999 educ_m2000 educ_m2001 educ_m2002 educ_m2003 educ_m2004 educ_m2005 educ_m2006
educ_m2007 educ_m2008 educ_m2009 educ_m2010 educ_m2011 educ_m2012 educ_m2013 educ_m2014 educ_m2015 educ_m2016 learn_gp2016 learn_gp2015 learn_gp2014
learn_gp2013 learn_gp2012 learn_gp2011 learn_gp2010 learn_gp2009 learn_gp2008 learn_gp2007 learn_gp2006 learn_gp2005 learn_gp2004 learn_gp2003 learn_gp2002
learn_gp2001 learn_gp2000 learn_gp1999 learn_gp1998 learn_gp1997 learn_gp1996 learn_gp1995 educ_gp1995 educ_gp1996 educ_gp1997 educ_gp1998 educ_gp1999
educ_gp2000 educ_gp2001 educ_gp2002 educ_gp2003 educ_gp2004 educ_gp2005 educ_gp2006 educ_gp2007 educ_gp2008 educ_gp2009 educ_gp2010 educ_gp2011 educ_gp2012
educ_gp2013 educ_gp2014 educ_gp2015 educ_gp2016 bf_open_m2008 bf_open_m2009 bf_open_m2010 bf_open_m2011 bf_open_m2012 bf_open_m2013 bf_open_m2014 bf_open_m2015
bf_open_m2016 bf_consc_m2005 bf_consc_m2006 bf_consc_m2007 bf_consc_m2008 bf_consc_m2009 bf_consc_m2010 bf_consc_m2011 bf_consc_m2012 bf_consc_m2013
bf_consc_m2014 bf_consc_m2015 bf_consc_m2016 bf_extra_m2016 bf_extra_m2015 bf_extra_m2014 bf_extra_m2013 bf_extra_m2012 bf_extra_m2011 bf_extra_m2010
bf_extra_m2009 bf_extra_m2008 bf_extra_m2007 bf_extra_m2006 bf_extra_m2005 bf_agree_m2005 bf_agree_m2006 bf_agree_m2007 bf_agree_m2008 bf_agree_m2009
bf_agree_m2010 bf_agree_m2011 bf_agree_m2012 bf_agree_m2013 bf_agree_m2014 bf_agree_m2015 bf_agree_m2016 bf_emostab_m2005 bf_emostab_m2006 bf_emostab_m2007
bf_emostab_m2008 bf_emostab_m2009 bf_emostab_m2010 bf_emostab_m2011 bf_emostab_m2012 bf_emostab_m2013 bf_emostab_m2014 bf_emostab_m2015 bf_emostab_m2016 on m =
1
r(498);

I always get this error message and have no clue to solve it. I tried a lot.

Besides this I got some comments/questions:
- I did not use "xtset, clear" sice I have a cross-secational data set and not a panel data set
- I would like to impute also the variables intact_family_m, educ_gp and learn_gp for "non-relevant years" (1993-1994). Is it then problematic due to data availablity trade-off?
- Why we do not need to specify "mi register regular" and place these variables behind the "imputation variables"?
- Last but not least, what do I need/proceed when the posterior isnt positive definite?

I appreciate your help so much! Thank you very much.

Best wishes!
Comment
Vera Schmidt

Join Date: Aug 2023

Posts: 21
#4

07 Jul 2025, 07:57

Even with adding "mi register regular" and the right handside within the "mi impute" equation runs in an error:

mi set flong
mi register imputed educ_m* intact_family_m2016 intact_family_m2015 intact_family_m2014 intact_family_m2013 intact_family_m2012 intact_family_m2011 intact_family_m2010 intact_family_m2009 intact_family_m2008 intact_family_m2007 intact_family_m2006 intact_family_m2005 intact_family_m2004 intact_family_m2003 intact_family_m2002 intact_family_m2001 intact_family_m2000 intact_family_m1999 intact_family_m1998 intact_family_m1997 intact_family_m1996 intact_family_m1995 learn_gp2016 learn_gp2015 learn_gp2014 learn_gp2013 learn_gp2012 learn_gp2011 learn_gp2010 learn_gp2009 learn_gp2008 learn_gp2007 learn_gp2006 learn_gp2005 learn_gp2004 learn_gp2003 learn_gp2002 learn_gp2001 learn_gp2000 learn_gp1999 learn_gp1998 learn_gp1997 learn_gp1996 learn_gp1995 educ_gp1995 educ_gp1996 educ_gp1997 educ_gp1998 educ_gp1999 educ_gp2000 educ_gp2001 educ_gp2002 educ_gp2003 educ_gp2004 educ_gp2005 educ_gp2006 educ_gp2007 educ_gp2008 educ_gp2009 educ_gp2010 educ_gp2011 educ_gp2012 educ_gp2013 educ_gp2014 educ_gp2015 educ_gp2016 bf_open_m2008 bf_open_m2009 bf_open_m2010 bf_open_m2011 bf_open_m2012 bf_open_m2013 bf_open_m2014 bf_open_m2015 bf_open_m2016 bf_consc_m2005 bf_consc_m2006 bf_consc_m2007 bf_consc_m2008 bf_consc_m2009 bf_consc_m2010 bf_consc_m2011 bf_consc_m2012 bf_consc_m2013 bf_consc_m2014 bf_consc_m2015 bf_consc_m2016 bf_extra_m2016 bf_extra_m2015 bf_extra_m2014 bf_extra_m2013 bf_extra_m2012 bf_extra_m2011 bf_extra_m2010 bf_extra_m2009 bf_extra_m2008 bf_extra_m2007 bf_extra_m2006 bf_extra_m2005 bf_agree_m2005 bf_agree_m2006 bf_agree_m2007 bf_agree_m2008 bf_agree_m2009 bf_agree_m2010 bf_agree_m2011 bf_agree_m2012 bf_agree_m2013 bf_agree_m2014 bf_agree_m2015 bf_agree_m2016 bf_emostab_m2005 bf_emostab_m2006 bf_emostab_m2007 bf_emostab_m2008 bf_emostab_m2009 bf_emostab_m2010 bf_emostab_m2011 bf_emostab_m2012 bf_emostab_m2013 bf_emostab_m2014 bf_emostab_m2015 bf_emostab_m2016

mi register regular gpa* female* migrant* north* east* west* age_birth* learn_m* only_child_m* oldest_m* num_sib_m* age_11* age_12* age_13* age_14*

mi impute chained (logit) intact_family_m2016 intact_family_m2015 intact_family_m2014 intact_family_m2013 intact_family_m2012 intact_family_m2011 intact_family_m2010 intact_family_m2009 intact_family_m2008 intact_family_m2007 intact_family_m2006 intact_family_m2005 intact_family_m2004 intact_family_m2003 intact_family_m2002 intact_family_m2001 intact_family_m2000 intact_family_m1999 intact_family_m1998 intact_family_m1997 intact_family_m1996 intact_family_m1995 (pmm, knn(5)) educ_m* learn_gp2016 learn_gp2015 learn_gp2014 learn_gp2013 learn_gp2012 learn_gp2011 learn_gp2010 learn_gp2009 learn_gp2008 learn_gp2007 learn_gp2006 learn_gp2005 learn_gp2004 learn_gp2003 learn_gp2002 learn_gp2001 learn_gp2000 learn_gp1999 learn_gp1998 learn_gp1997 learn_gp1996 learn_gp1995 educ_gp1995 educ_gp1996 educ_gp1997 educ_gp1998 educ_gp1999 educ_gp2000 educ_gp2001 educ_gp2002 educ_gp2003 educ_gp2004 educ_gp2005 educ_gp2006 educ_gp2007 educ_gp2008 educ_gp2009 educ_gp2010 educ_gp2011 educ_gp2012 educ_gp2013 educ_gp2014 educ_gp2015 educ_gp2016 bf_open_m2008 bf_open_m2009 bf_open_m2010 bf_open_m2011 bf_open_m2012 bf_open_m2013 bf_open_m2014 bf_open_m2015 bf_open_m2016 bf_consc_m2005 bf_consc_m2006 bf_consc_m2007 bf_consc_m2008 bf_consc_m2009 bf_consc_m2010 bf_consc_m2011 bf_consc_m2012 bf_consc_m2013 bf_consc_m2014 bf_consc_m2015 bf_consc_m2016 bf_extra_m2016 bf_extra_m2015 bf_extra_m2014 bf_extra_m2013 bf_extra_m2012 bf_extra_m2011 bf_extra_m2010 bf_extra_m2009 bf_extra_m2008 bf_extra_m2007 bf_extra_m2006 bf_extra_m2005 bf_agree_m2005 bf_agree_m2006 bf_agree_m2007 bf_agree_m2008 bf_agree_m2009 bf_agree_m2010 bf_agree_m2011 bf_agree_m2012 bf_agree_m2013 bf_agree_m2014 bf_agree_m2015 bf_agree_m2016 bf_emostab_m2005 bf_emostab_m2006 bf_emostab_m2007 bf_emostab_m2008 bf_emostab_m2009 bf_emostab_m2010 bf_emostab_m2011 bf_emostab_m2012 bf_emostab_m2013 bf_emostab_m2014 bf_emostab_m2015 bf_emostab_m2016 = gpa* female* migrant* num_sib* north* east* west* age_birth* learn_m* only_child_m* oldest_m* num_sib_m* age_11* age_12* age_13* age_14*, add(3) rseed(123) dots

Performing chained iterations:
imputing m=1 through m=3 no observations
error occurred during imputation of intact_family_m2016 intact_family_m2015 intact_family_m2014 intact_family_m2013 intact_family_m2012 intact_family_m2011
intact_family_m2010 intact_family_m2009 intact_family_m2008 intact_family_m2007 intact_family_m2006 intact_family_m2005 intact_family_m2004 intact_family_m2003
intact_family_m2002 intact_family_m2001 intact_family_m2000 intact_family_m1999 intact_family_m1998 intact_family_m1997 intact_family_m1996 intact_family_m1995
educ_m1993 educ_m1994 educ_m1995 educ_m1996 educ_m1997 educ_m1998 educ_m1999 educ_m2000 educ_m2001 educ_m2002 educ_m2003 educ_m2004 educ_m2005 educ_m2006
educ_m2007 educ_m2008 educ_m2009 educ_m2010 educ_m2011 educ_m2012 educ_m2013 educ_m2014 educ_m2015 educ_m2016 learn_gp2016 learn_gp2015 learn_gp2014
learn_gp2013 learn_gp2012 learn_gp2011 learn_gp2010 learn_gp2009 learn_gp2008 learn_gp2007 learn_gp2006 learn_gp2005 learn_gp2004 learn_gp2003 learn_gp2002
learn_gp2001 learn_gp2000 learn_gp1999 learn_gp1998 learn_gp1997 learn_gp1996 learn_gp1995 educ_gp1995 educ_gp1996 educ_gp1997 educ_gp1998 educ_gp1999
educ_gp2000 educ_gp2001 educ_gp2002 educ_gp2003 educ_gp2004 educ_gp2005 educ_gp2006 educ_gp2007 educ_gp2008 educ_gp2009 educ_gp2010 educ_gp2011 educ_gp2012
educ_gp2013 educ_gp2014 educ_gp2015 educ_gp2016 bf_open_m2008 bf_open_m2009 bf_open_m2010 bf_open_m2011 bf_open_m2012 bf_open_m2013 bf_open_m2014 bf_open_m2015
bf_open_m2016 bf_consc_m2005 bf_consc_m2006 bf_consc_m2007 bf_consc_m2008 bf_consc_m2009 bf_consc_m2010 bf_consc_m2011 bf_consc_m2012 bf_consc_m2013
bf_consc_m2014 bf_consc_m2015 bf_consc_m2016 bf_extra_m2016 bf_extra_m2015 bf_extra_m2014 bf_extra_m2013 bf_extra_m2012 bf_extra_m2011 bf_extra_m2010
bf_extra_m2009 bf_extra_m2008 bf_extra_m2007 bf_extra_m2006 bf_extra_m2005 bf_agree_m2005 bf_agree_m2006 bf_agree_m2007 bf_agree_m2008 bf_agree_m2009
bf_agree_m2010 bf_agree_m2011 bf_agree_m2012 bf_agree_m2013 bf_agree_m2014 bf_agree_m2015 bf_agree_m2016 bf_emostab_m2005 bf_emostab_m2006 bf_emostab_m2007
bf_emostab_m2008 bf_emostab_m2009 bf_emostab_m2010 bf_emostab_m2011 bf_emostab_m2012 bf_emostab_m2013 bf_emostab_m2014 bf_emostab_m2015 bf_emostab_m2016 on m =
1
r(2000);

I am really stuck.
Comment
Felix Bittmann

Join Date: Aug 2018

Posts: 727
#5

07 Jul 2025, 12:50

I think your general approach is fine. The problem is now the data. You have built an extremely large imputation model, due to the large number of waves. I fear this model will not converge. The main reason for this problem is that waves that are close to each other have often identical or very similar values as many variables are rather time constant. I would attempt to simplify the model. Do you need all waves? What about taking only every second or third year? You could also try to first impute manually (e.g. last observation carried forward) and then impute with MICE, at least for some variables and waves. Finally, you can also use PMM for binary variables instead of logit, which can make things faster. Some variables MUST be time constant by nature, such as gender. These should always be identical over all waves and hence have no year suffix.
If you want to impute variables for year when they are never available (from zero cases), this is not a good idea. I would then go with interpolation before the MICE imputation.
The xtset command was only relevant in this specific data example as these data were already declared panel.
Regarding regular variables, personally, I just throw every variable under "imputed", even complete ones. This should not cause any issues.

Best wishes

Stata 18.0 MP | ORCID | Google Scholar
1 like
Comment

Announcement