Hello everyone,
I'm currently working with a dataset comprising over 100 variables, around 25,000 observations with three time points (baseline, FU1, FU2). The dataset exhibits various types of missingness, each associated with different codes:
As I prepare to run the MCAR (Missing Completely at Random) test in Stata, I'm considering how to handle these different types of missing values. Specifically, I'm unsure whether to standardize all missing values to "." or retain some of the existing codes. Thus, my question is how I should handle codes 98 and 99.
Furthermore, my dataset spans three timepoints: baseline, follow-up 1, and follow-up 2. Also, should I include my main exposure and outcome variables for all three time periods, with covariables only at the baseline when I want to run the test? Or, should I include all three time points for covariables?
I would appreciate any insights or recommendations on these matters.
Thank you.
I'm currently working with a dataset comprising over 100 variables, around 25,000 observations with three time points (baseline, FU1, FU2). The dataset exhibits various types of missingness, each associated with different codes:
- "Don't know" is coded as 98.
- "Skipped pattern" is coded as -99999.
- "Refused" is coded as 99.
- "Missing" is coded as -88888
As I prepare to run the MCAR (Missing Completely at Random) test in Stata, I'm considering how to handle these different types of missing values. Specifically, I'm unsure whether to standardize all missing values to "." or retain some of the existing codes. Thus, my question is how I should handle codes 98 and 99.
Furthermore, my dataset spans three timepoints: baseline, follow-up 1, and follow-up 2. Also, should I include my main exposure and outcome variables for all three time periods, with covariables only at the baseline when I want to run the test? Or, should I include all three time points for covariables?
I would appreciate any insights or recommendations on these matters.
Thank you.