Dear all,
I'm trying to translate in Stata syntax teh following operation to impure MNAR data:
"Use Multiple Imputation (MI) to replace missing values. MI can be conducted in SPSS as a dedicated function and has 3 steps. First missing data are replaced using an EM algorithm augmented by a Bayesian procedure (conditional posterior distribution – you don’t need to know what this means, even I struggle with it!), which yields multiple imputated data sets. The second step involves analyzing each of the yielded data sets separately with standard statistics (e.g., linear regression). The third step involves aggregating results from each separate data set and calculating standard errors for significance testing on the basis of both within- and between-data set variance. Researchers argue that a small number of MI data sets (m =10) will be adequate for most situations. The main advantage of MI is that by yielding multiple data sets, researchers can calculate the ‘true’ uncertainty (accounting for both within- and between-imputation variance) associated with analyses using missing data, and therefore it overcomes the problem of underestimated SEs using single data sets produced by EM. Another advantage of MI is that is performs well under MCAR, MAR, and MNAR, and is robust to large amounts of missing data (i.e., > 10%). An obvious drawback, however, is that MI provides for a cumbersome analysis with more than one data set to consider. It also provides different estimates with every execution meaning the results are not determinate. Nonetheless, this technique is well suited to analyses when there is a substantial proportion of missing data due to some systematic reason(s). "
Any suggestions about which commands to apply?
Thank you very much,
Diana
I'm trying to translate in Stata syntax teh following operation to impure MNAR data:
"Use Multiple Imputation (MI) to replace missing values. MI can be conducted in SPSS as a dedicated function and has 3 steps. First missing data are replaced using an EM algorithm augmented by a Bayesian procedure (conditional posterior distribution – you don’t need to know what this means, even I struggle with it!), which yields multiple imputated data sets. The second step involves analyzing each of the yielded data sets separately with standard statistics (e.g., linear regression). The third step involves aggregating results from each separate data set and calculating standard errors for significance testing on the basis of both within- and between-data set variance. Researchers argue that a small number of MI data sets (m =10) will be adequate for most situations. The main advantage of MI is that by yielding multiple data sets, researchers can calculate the ‘true’ uncertainty (accounting for both within- and between-imputation variance) associated with analyses using missing data, and therefore it overcomes the problem of underestimated SEs using single data sets produced by EM. Another advantage of MI is that is performs well under MCAR, MAR, and MNAR, and is robust to large amounts of missing data (i.e., > 10%). An obvious drawback, however, is that MI provides for a cumbersome analysis with more than one data set to consider. It also provides different estimates with every execution meaning the results are not determinate. Nonetheless, this technique is well suited to analyses when there is a substantial proportion of missing data due to some systematic reason(s). "
Any suggestions about which commands to apply?
Thank you very much,
Diana
Comment