Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Simulations for missing data

    Dear Listers,

    I am interested in exploring different approaches for dealing with missing data in a cluster RCT. I don't have specific data to use so I was advised I could simulate one.

    In the example below I am assuming the missingness is due to the allocation arm (as the control arm may be less appealing) as well as the outcome (those who develop the disease are less likely to respond). Is the code correctly, combining the effect of outcome and treatment in determining missingness?

    Based on previous code, I was able to create a dataset and wanted to make sure my approach for creating the dataset is well specified in Stata.

    Also based on an earlier post I am using invlogit and specifying a constant; however, I am not sure whether the constant needs to be selected based on the % of missing data I want to create and how to best identify it (short of trial and error). https://www.stata.com/statalist/arch.../msg00909.html

    Grateful for anyone who can cast an eye on the code and let me know if I am building missing data correctly.

    Code:
    clear 
    clear matrix
    macro drop _all
    
    set seed 12345
    
    local icc=0.025
    local k=50
    local n=12
    local N = `n'*`k'
    
    local control=0.03 
    local experimental=0.10
    
    // Clusters
    set obs `k'
    generate int cid = _n
    generate double cid_u = rnormal(0, sqrt(`icc' * _pi^2 / (3 - 3 *`icc')))
    
    // Treatment groups
    generate byte trt = mod(_n, 2)
    
    // Cluster size 
    expand `n'
    
    g pid = _n
    
    *** Covariates associated with the outcome - for possible MAR missingness
    g sex = uniform()<0.5
    g OR_sex = 1.02
    g b_sex = log(OR_sex)
    drawnorm age, n(`N') mean(35) sd(5)
    g OR_age = 0.97
    g b_age = log(OR_age) 
    g CPD= rnormal(15, 3)
    g OR_cpd = 0.99
    g b_cpd = log(OR_cpd)
    
    gen double xbu = logit(!trt * `control' + trt * `experimental') ///
    + cid_u + b_cpd*CPD 
    
    //Create the outcome assuming it is partly driven by CPD score
    g byte outcome = rbinomial(1, invlogit(xbu))
    tab outcome trt, col
    
    ***************************************************
    *** START SPECIFYING MISSING DATA ***
    ***************************************************
    * Due to outcome
    g OR_out_miss = 1.1
    g b_out_miss = log(OR_out_miss)
    
    * Due to treatment
    g OR_trt_miss = 0.90
    g b_trt_miss = log(OR_trt_miss)
    
    local konst = -1
    
    * This gives 25% missingness - what if I wanted more/less?
    capture drop misstemp 
    g misstemp = invlogit(-`konst'+ /// 
    b_out_miss*outcome + ///
    b_trt_miss*trt)
    sum misstemp , d
    
    capture drop b_outcome2
    g b_outcome2 = outcome
    replace b_outcome2 = . if uniform()<misstemp /*what if this did not capture the necessary % of missingness as done by misstemp?*/
    tab b_outcome2 trt
Working...
X