Hi there,
I'm trying to run a multivariate multiple imputation on a panel dataset that was originally in long format. I'm following the instructions found here: https://www.stata.com/support/faqs/s...and-mi-impute/. I understand I need to reshape to long format to perform the imputations, and then convert back to long. I can get as far as running the imputations, but have struck some errors upon trying to return to long format, to run a -mixed- model.
These errors are:
stubs may not be of the form _#_name
specify the names without the _#_ prefix
and
variable time0 contains all missing values
I'm not sure what the bug is, or how to fix it. Can anyone suggest a solution?
My code is below. It was difficult to find an example dataset that approximated mine.
Many thanks,
Steve
* ### Step 3: Reshape the dataset from long to wide.
* &&& Sub-step a: Define a global varlist for all time-varying vars in the stacked panel dataset. The nominated 'time' variable will be dropped by reshape.
global xlist studentmasterid nsnid time year_level /*token id_token time0 school school_123 school_123_bird year_level_same sex ethnicity*/ maths_overall asTTle_natl_maths_mean0 asTTle_cluster_maths_mean0 asTTle_natl_maths_mean asTTle_cluster_maths_mean reading_overall asTTle_natl_reading_mean0 asTTle_cluster_reading_mean0 asTTle_natl_reading_mean asTTle_cluster_reading_mean mms_scale mms_natl_mean0 mms_natl_mean rm_scale rm_natl_mean0 rm_natl_mean
* &&& Sub-step b: Give each var in variable list a '_' suffix, to create period tags (_0, _1, _2, etc), for ease of transforming data from long to wide and back.
rename ($xlist) (=_) //, dryrun
* &&& Sub-step c: Define a new global variable list for the renamed vars.
global ylist studentmasterid_ nsnid_ time_ year_level_ /*token id_token time0 school school_123 school_123_bird year_level_same sex ethnicity*/ maths_overall_ asTTle_natl_maths_mean0_ asTTle_cluster_maths_mean0_ asTTle_natl_maths_mean_ asTTle_cluster_maths_mean_ reading_overall_ asTTle_natl_reading_mean0_ asTTle_cluster_reading_mean0_ asTTle_natl_reading_mean_ asTTle_cluster_reading_mean_ mms_scale_ mms_natl_mean0_ mms_natl_mean_ rm_scale_ rm_natl_mean0_ rm_natl_mean_
* &&&& Sub-step d: Set variable abbreviations off, as a precaution, to prevent ambiguous var. name abbreviations from stopping the transformation.
* Then run the reshape wide command, and turn abbreviations back on.
set varabbrev off
reshape wide $ylist, i(token id_token) j(time0)
set varabbrev on
* 1. Set data as mi.
mi set wide
* 2. MI using multivariate normal distribution (MVN)
* a. Imputation phase
* -------------------
* After the data is mi set, Stata requires 3 additional commands. The first is mi register imputed. This command identifies which variables in the imputation model have missing information.
mi register imputed maths_overall_0 maths_overall_1 maths_overall_2 /*mms_scale_0 mms_scale_1 mms_scale_2 Auxilary var.*/
* The second command is mi impute mvn where the user specifies the imputation model to be used and the number of imputed datasets to be created.
mi xtset, clear // Clear any previous time-series settings that are old or no-longer-valid.
mi impute mvn maths_overall_0 maths_overall_1 maths_overall_2 /*mms_scale Auxilary var.*/ = sex ethnicity, add(20) rseed (53421) /*savewlf(Worst_L_Fn_Maths)*/
* But to use mi estimate: mixed, we need to reshape our data back to long form. NB. I removed the stub names _#_ from the imputed maths_overall variables, after I received the first error about the format of stub names..
mi reshape long studentmasterid_0 nsnid_0 maths_overall_0 asTTle_natl_maths_mean0_0 asTTle_cluster_maths_mean0_0 asTTle_natl_maths_mean_0 asTTle_cluster_maths_mean_0 reading_overall_0 asTTle_natl_reading_mean0_0 asTTle_cluster_reading_mean0_0 asTTle_natl_reading_mean_0 asTTle_cluster_reading_mean_0 mms_scale_0 mms_natl_mean0_0 mms_natl_mean_0 rm_scale_0 rm_natl_mean0_0 rm_natl_mean_0 studentmasterid_1 nsnid_1 time_1 year_level_1 maths_overall_1 asTTle_natl_maths_mean0_1 asTTle_cluster_maths_mean0_1 asTTle_natl_maths_mean_1 asTTle_cluster_maths_mean_1 reading_overall_1 asTTle_natl_reading_mean0_1 asTTle_cluster_reading_mean0_1 asTTle_natl_reading_mean_1 asTTle_cluster_reading_mean_1 mms_scale_1 mms_natl_mean0_1 mms_natl_mean_1 rm_scale_1 rm_natl_mean0_1 rm_natl_mean_1 studentmasterid_2 nsnid_2 time_2 year_level_2 maths_overall_2 asTTle_natl_maths_mean0_2 asTTle_cluster_maths_mean0_2 asTTle_natl_maths_mean_2 asTTle_cluster_maths_mean_2 reading_overall_2 asTTle_natl_reading_mean0_2 asTTle_cluster_reading_mean0_2 asTTle_natl_reading_mean_2 asTTle_cluster_reading_mean_2 mms_scale_2 mms_natl_mean0_2 mms_natl_mean_2 rm_scale_2 rm_natl_mean0_2 rm_natl_mean_2 school school_123 school_123_bird year_level_same sex ethnicity mi_miss maths_overall_0 maths_overall_0 maths_overall_0 maths_overall_0 maths_overall_0 maths_overall_0 maths_overall_0 maths_overall_0 maths_overall_0 maths_overall_0 maths_overall_0 maths_overall_0 maths_overall_0 maths_overall_0 maths_overall_0 maths_overall_0 maths_overall_0 maths_overall_0 maths_overall_0 maths_overall_0 maths_overall_1 maths_overall_1 maths_overall_1 maths_overall_1 maths_overall_1 maths_overall_1 maths_overall_1 maths_overall_1 maths_overall_1 maths_overall_1 maths_overall_1 maths_overall_1 maths_overall_1 maths_overall_1 maths_overall_1 maths_overall_1 maths_overall_1 maths_overall_1 maths_overall_1 maths_overall_1 maths_overall_2 maths_overall_2 maths_overall_2 maths_overall_2 maths_overall_2 maths_overall_2 maths_overall_2 maths_overall_2 maths_overall_2 maths_overall_2 maths_overall_2 maths_overall_2 maths_overall_2 maths_overall_2 maths_overall_2 maths_overall_2 maths_overall_2 maths_overall_2 maths_overall_2 maths_overall_2, i(token id_token) j(time0)
I'm trying to run a multivariate multiple imputation on a panel dataset that was originally in long format. I'm following the instructions found here: https://www.stata.com/support/faqs/s...and-mi-impute/. I understand I need to reshape to long format to perform the imputations, and then convert back to long. I can get as far as running the imputations, but have struck some errors upon trying to return to long format, to run a -mixed- model.
These errors are:
stubs may not be of the form _#_name
specify the names without the _#_ prefix
and
variable time0 contains all missing values
I'm not sure what the bug is, or how to fix it. Can anyone suggest a solution?
My code is below. It was difficult to find an example dataset that approximated mine.
Many thanks,
Steve
* ### Step 3: Reshape the dataset from long to wide.
* &&& Sub-step a: Define a global varlist for all time-varying vars in the stacked panel dataset. The nominated 'time' variable will be dropped by reshape.
global xlist studentmasterid nsnid time year_level /*token id_token time0 school school_123 school_123_bird year_level_same sex ethnicity*/ maths_overall asTTle_natl_maths_mean0 asTTle_cluster_maths_mean0 asTTle_natl_maths_mean asTTle_cluster_maths_mean reading_overall asTTle_natl_reading_mean0 asTTle_cluster_reading_mean0 asTTle_natl_reading_mean asTTle_cluster_reading_mean mms_scale mms_natl_mean0 mms_natl_mean rm_scale rm_natl_mean0 rm_natl_mean
* &&& Sub-step b: Give each var in variable list a '_' suffix, to create period tags (_0, _1, _2, etc), for ease of transforming data from long to wide and back.
rename ($xlist) (=_) //, dryrun
* &&& Sub-step c: Define a new global variable list for the renamed vars.
global ylist studentmasterid_ nsnid_ time_ year_level_ /*token id_token time0 school school_123 school_123_bird year_level_same sex ethnicity*/ maths_overall_ asTTle_natl_maths_mean0_ asTTle_cluster_maths_mean0_ asTTle_natl_maths_mean_ asTTle_cluster_maths_mean_ reading_overall_ asTTle_natl_reading_mean0_ asTTle_cluster_reading_mean0_ asTTle_natl_reading_mean_ asTTle_cluster_reading_mean_ mms_scale_ mms_natl_mean0_ mms_natl_mean_ rm_scale_ rm_natl_mean0_ rm_natl_mean_
* &&&& Sub-step d: Set variable abbreviations off, as a precaution, to prevent ambiguous var. name abbreviations from stopping the transformation.
* Then run the reshape wide command, and turn abbreviations back on.
set varabbrev off
reshape wide $ylist, i(token id_token) j(time0)
set varabbrev on
* 1. Set data as mi.
mi set wide
* 2. MI using multivariate normal distribution (MVN)
* a. Imputation phase
* -------------------
* After the data is mi set, Stata requires 3 additional commands. The first is mi register imputed. This command identifies which variables in the imputation model have missing information.
mi register imputed maths_overall_0 maths_overall_1 maths_overall_2 /*mms_scale_0 mms_scale_1 mms_scale_2 Auxilary var.*/
* The second command is mi impute mvn where the user specifies the imputation model to be used and the number of imputed datasets to be created.
mi xtset, clear // Clear any previous time-series settings that are old or no-longer-valid.
mi impute mvn maths_overall_0 maths_overall_1 maths_overall_2 /*mms_scale Auxilary var.*/ = sex ethnicity, add(20) rseed (53421) /*savewlf(Worst_L_Fn_Maths)*/
* But to use mi estimate: mixed, we need to reshape our data back to long form. NB. I removed the stub names _#_ from the imputed maths_overall variables, after I received the first error about the format of stub names..
mi reshape long studentmasterid_0 nsnid_0 maths_overall_0 asTTle_natl_maths_mean0_0 asTTle_cluster_maths_mean0_0 asTTle_natl_maths_mean_0 asTTle_cluster_maths_mean_0 reading_overall_0 asTTle_natl_reading_mean0_0 asTTle_cluster_reading_mean0_0 asTTle_natl_reading_mean_0 asTTle_cluster_reading_mean_0 mms_scale_0 mms_natl_mean0_0 mms_natl_mean_0 rm_scale_0 rm_natl_mean0_0 rm_natl_mean_0 studentmasterid_1 nsnid_1 time_1 year_level_1 maths_overall_1 asTTle_natl_maths_mean0_1 asTTle_cluster_maths_mean0_1 asTTle_natl_maths_mean_1 asTTle_cluster_maths_mean_1 reading_overall_1 asTTle_natl_reading_mean0_1 asTTle_cluster_reading_mean0_1 asTTle_natl_reading_mean_1 asTTle_cluster_reading_mean_1 mms_scale_1 mms_natl_mean0_1 mms_natl_mean_1 rm_scale_1 rm_natl_mean0_1 rm_natl_mean_1 studentmasterid_2 nsnid_2 time_2 year_level_2 maths_overall_2 asTTle_natl_maths_mean0_2 asTTle_cluster_maths_mean0_2 asTTle_natl_maths_mean_2 asTTle_cluster_maths_mean_2 reading_overall_2 asTTle_natl_reading_mean0_2 asTTle_cluster_reading_mean0_2 asTTle_natl_reading_mean_2 asTTle_cluster_reading_mean_2 mms_scale_2 mms_natl_mean0_2 mms_natl_mean_2 rm_scale_2 rm_natl_mean0_2 rm_natl_mean_2 school school_123 school_123_bird year_level_same sex ethnicity mi_miss maths_overall_0 maths_overall_0 maths_overall_0 maths_overall_0 maths_overall_0 maths_overall_0 maths_overall_0 maths_overall_0 maths_overall_0 maths_overall_0 maths_overall_0 maths_overall_0 maths_overall_0 maths_overall_0 maths_overall_0 maths_overall_0 maths_overall_0 maths_overall_0 maths_overall_0 maths_overall_0 maths_overall_1 maths_overall_1 maths_overall_1 maths_overall_1 maths_overall_1 maths_overall_1 maths_overall_1 maths_overall_1 maths_overall_1 maths_overall_1 maths_overall_1 maths_overall_1 maths_overall_1 maths_overall_1 maths_overall_1 maths_overall_1 maths_overall_1 maths_overall_1 maths_overall_1 maths_overall_1 maths_overall_2 maths_overall_2 maths_overall_2 maths_overall_2 maths_overall_2 maths_overall_2 maths_overall_2 maths_overall_2 maths_overall_2 maths_overall_2 maths_overall_2 maths_overall_2 maths_overall_2 maths_overall_2 maths_overall_2 maths_overall_2 maths_overall_2 maths_overall_2 maths_overall_2 maths_overall_2, i(token id_token) j(time0)
Comment