Dear Statalist Users,
I am currently working on analyzing missing data in my dataset through several approaches (e.g., including performing an mcartest that did not converge). As a result, I am currently working on analyzing the missing data by comparing the values in each variable (columns) between observed and missing data (reported as a dichotomous variable 0=observed, 1=missing) in all the other variables (rows) using the asymptotic Mann-Whitney U test (i.e., the ranksum command) followed by the Benjamini-hochberg correction method, a process described in this paper for example (please see relevant supplement here for what I really aim to do).
However, as I have around 40 variables that have missing data, I was wondering if there was a more efficient way to conduct my analyses?
Currently, I am planning on running the code below for each pair of variables, followed by correcting for the p-values manually on Excel (i.e., ranking the p-values in each column and then applying the correction formula). However, this seems inefficient with so many variables and p-values to go through, and as I am still new to STATA, I was wondering if there was a more efficient method?
Again, here is an example table of what I am ultimately hoping to create:
Thank you all!
I am currently working on analyzing missing data in my dataset through several approaches (e.g., including performing an mcartest that did not converge). As a result, I am currently working on analyzing the missing data by comparing the values in each variable (columns) between observed and missing data (reported as a dichotomous variable 0=observed, 1=missing) in all the other variables (rows) using the asymptotic Mann-Whitney U test (i.e., the ranksum command) followed by the Benjamini-hochberg correction method, a process described in this paper for example (please see relevant supplement here for what I really aim to do).
However, as I have around 40 variables that have missing data, I was wondering if there was a more efficient way to conduct my analyses?
Currently, I am planning on running the code below for each pair of variables, followed by correcting for the p-values manually on Excel (i.e., ranking the p-values in each column and then applying the correction formula). However, this seems inefficient with so many variables and p-values to go through, and as I am still new to STATA, I was wondering if there was a more efficient method?
Code:
ranksum cesd1, by (cesd1_miss) ranksum cesd1, by (cesd2_miss) ranksum cesd1, by (cesd3_miss) ranksum cesd1, by (sleep1_miss) // etc...
cesd1 | cesd2 | cesd3 | sleep1 | sleep2 | dm1 | dm2 | etc... | |
cesd1_miss | corrected P-Value (CP) | CP | CP | CP | CP | CP | CP | |
cesd2_miss | CP | CP | CP | CP | CP | CP | CP | |
cesd3_miss | CP | CP | CP | CP | CP | CP | CP | |
sleep1_miss | CP | CP | CP | CP | CP | CP | CP | |
sleep2_miss | CP | CP | CP | CP | CP | CP | CP | |
dem1_miss | CP | CP | CP | CP | CP | CP | CP | |
dem2_miss | CP | CP | CP | CP | CP | CP | CP | |
etc.... | CP | CP | CP | CP | CP | CP | CP |