Dear Listers,
I am interested in exploring different approaches for dealing with missing data in a cluster RCT. I don't have specific data to use so I was advised I could simulate one.
In the example below I am assuming the missingness is due to the allocation arm (as the control arm may be less appealing) as well as the outcome (those who develop the disease are less likely to respond). Is the code correctly, combining the effect of outcome and treatment in determining missingness?
Based on previous code, I was able to create a dataset and wanted to make sure my approach for creating the dataset is well specified in Stata.
Also based on an earlier post I am using invlogit and specifying a constant; however, I am not sure whether the constant needs to be selected based on the % of missing data I want to create and how to best identify it (short of trial and error). https://www.stata.com/statalist/arch.../msg00909.html
Grateful for anyone who can cast an eye on the code and let me know if I am building missing data correctly.
I am interested in exploring different approaches for dealing with missing data in a cluster RCT. I don't have specific data to use so I was advised I could simulate one.
In the example below I am assuming the missingness is due to the allocation arm (as the control arm may be less appealing) as well as the outcome (those who develop the disease are less likely to respond). Is the code correctly, combining the effect of outcome and treatment in determining missingness?
Based on previous code, I was able to create a dataset and wanted to make sure my approach for creating the dataset is well specified in Stata.
Also based on an earlier post I am using invlogit and specifying a constant; however, I am not sure whether the constant needs to be selected based on the % of missing data I want to create and how to best identify it (short of trial and error). https://www.stata.com/statalist/arch.../msg00909.html
Grateful for anyone who can cast an eye on the code and let me know if I am building missing data correctly.
Code:
clear clear matrix macro drop _all set seed 12345 local icc=0.025 local k=50 local n=12 local N = `n'*`k' local control=0.03 local experimental=0.10 // Clusters set obs `k' generate int cid = _n generate double cid_u = rnormal(0, sqrt(`icc' * _pi^2 / (3 - 3 *`icc'))) // Treatment groups generate byte trt = mod(_n, 2) // Cluster size expand `n' g pid = _n *** Covariates associated with the outcome - for possible MAR missingness g sex = uniform()<0.5 g OR_sex = 1.02 g b_sex = log(OR_sex) drawnorm age, n(`N') mean(35) sd(5) g OR_age = 0.97 g b_age = log(OR_age) g CPD= rnormal(15, 3) g OR_cpd = 0.99 g b_cpd = log(OR_cpd) gen double xbu = logit(!trt * `control' + trt * `experimental') /// + cid_u + b_cpd*CPD //Create the outcome assuming it is partly driven by CPD score g byte outcome = rbinomial(1, invlogit(xbu)) tab outcome trt, col *************************************************** *** START SPECIFYING MISSING DATA *** *************************************************** * Due to outcome g OR_out_miss = 1.1 g b_out_miss = log(OR_out_miss) * Due to treatment g OR_trt_miss = 0.90 g b_trt_miss = log(OR_trt_miss) local konst = -1 * This gives 25% missingness - what if I wanted more/less? capture drop misstemp g misstemp = invlogit(-`konst'+ /// b_out_miss*outcome + /// b_trt_miss*trt) sum misstemp , d capture drop b_outcome2 g b_outcome2 = outcome replace b_outcome2 = . if uniform()<misstemp /*what if this did not capture the necessary % of missingness as done by misstemp?*/ tab b_outcome2 trt
