Simulations for missing data

Laura Myles

Join Date: Jun 2018
Posts: 164

Simulations for missing data

02 Mar 2023, 12:06

Dear Listers,

I am interested in exploring different approaches for dealing with missing data in a cluster RCT. I don't have specific data to use so I was advised I could simulate one.

In the example below I am assuming the missingness is due to the allocation arm (as the control arm may be less appealing) as well as the outcome (those who develop the disease are less likely to respond). Is the code correctly, combining the effect of outcome and treatment in determining missingness?

Based on previous code, I was able to create a dataset and wanted to make sure my approach for creating the dataset is well specified in Stata.

Also based on an earlier post I am using invlogit and specifying a constant; however, I am not sure whether the constant needs to be selected based on the % of missing data I want to create and how to best identify it (short of trial and error). https://www.stata.com/statalist/arch.../msg00909.html

Grateful for anyone who can cast an eye on the code and let me know if I am building missing data correctly.

Code:

clear 
clear matrix
macro drop _all

set seed 12345

local icc=0.025
local k=50
local n=12
local N = `n'*`k'

local control=0.03 
local experimental=0.10

// Clusters
set obs `k'
generate int cid = _n
generate double cid_u = rnormal(0, sqrt(`icc' * _pi^2 / (3 - 3 *`icc')))

// Treatment groups
generate byte trt = mod(_n, 2)

// Cluster size 
expand `n'

g pid = _n

*** Covariates associated with the outcome - for possible MAR missingness
g sex = uniform()<0.5
g OR_sex = 1.02
g b_sex = log(OR_sex)
drawnorm age, n(`N') mean(35) sd(5)
g OR_age = 0.97
g b_age = log(OR_age) 
g CPD= rnormal(15, 3)
g OR_cpd = 0.99
g b_cpd = log(OR_cpd)

gen double xbu = logit(!trt * `control' + trt * `experimental') ///
+ cid_u + b_cpd*CPD 

//Create the outcome assuming it is partly driven by CPD score
g byte outcome = rbinomial(1, invlogit(xbu))
tab outcome trt, col

***************************************************
*** START SPECIFYING MISSING DATA ***
***************************************************
* Due to outcome
g OR_out_miss = 1.1
g b_out_miss = log(OR_out_miss)

* Due to treatment
g OR_trt_miss = 0.90
g b_trt_miss = log(OR_trt_miss)

local konst = -1

* This gives 25% missingness - what if I wanted more/less?
capture drop misstemp 
g misstemp = invlogit(-`konst'+ /// 
b_out_miss*outcome + ///
b_trt_miss*trt)
sum misstemp , d

capture drop b_outcome2
g b_outcome2 = outcome
replace b_outcome2 = . if uniform()<misstemp /*what if this did not capture the necessary % of missingness as done by misstemp?*/
tab b_outcome2 trt

Tags: None

Announcement

Simulations for missing data