Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Minimum Detectable Effect Using Non-parametric Power Simulations on Existing Data and Assignment

    I am interested in calculating the minimum detectable effect (MDE) using non-parametric power simulation on an existing dataset with a fixed, pre-existing assignment to a binary treatment that I cannot alter. Imagine I have a dataset with some people that are assigned to treatment and control, but treatment has not happened, so there should be no systematic difference between the two groups. My understanding is that I can see what effects I am powered to detect by looking at the 95% confidence interval around the null effect. I should expect that any effect greater than this confidence interval would be detected by our study.

    Below I have calculated that interval first using a regression and also through simulation of that regression. The 95% CIs for the treat coefficient are very similar: [-.023, .072] and [-0.024,0.073], respectively.

    0) Does this seem like a reasonable procedure?
    1) Is one of these intervals better to use? Why aren't they identical?
    2) Am I correct that this design should be able to detect anything outside outside that CI?

    Stata code:
    Code:
    set seed 10011979 
    
    /* Regression Approach */
    sysuse nlsw88.dta, clear
    gen log_wage = ln(wage)
    gen treat      = runiform()>0.5
    tab treat
    reg log_wage i.treat, level(95) robust
    save "my_nlsw88.dta", replace
    
    /* Sumulation Approach */
    capture program drop my_nlsw88_reg
    program my_nlsw88_reg, rclass
        version 16.0
        use "my_nlsw88.dta", clear
        bsample 2246 // obs in original dataset
        reg log_wage i.treat, robust
        return scalar lift = _b[1.treat]
    end
    
    simulate lift = r(lift), reps(10000) dots(10000) saving("mde_sim.dta", replace): my_nlsw88_reg
    sum lift, meanonly
    local mean = r(mean)
    _pctile lift, percentile(2.5 97.5)
    return list
    di "MDE is " %-9.3f r(r1) "to " %-9.3f r(r2)

  • #2
    Here's a more complicated experiment design with clustered treatment:

    Code:
    cd "~/Desktop/"
    
    set seed 10011979 
    
    /* (1) Fake Data */
    sysuse nlsw88.dta, clear
    gen log_wage = ln(wage)
    /* assign whole age clusters to treatment at random */
    bys age: egen z = mean(uniform())
    gen treat = z >= .5046362
    drop z
    tab age treat, row
    save "my_nlsw88.dta", replace
    
    /* (2) Regression Approach */
    reg log_wage i.treat, level(95) cluster(age)
    
    /* (3) Simulation Approach (keeping assignment fixed, but sampling clusters with replacement) */
    capture program drop my_nlsw88_reg
    program my_nlsw88_reg, rclass
        version 16.0
        use "my_nlsw88.dta", clear
        bsample 13, cluster(age)
        reg log_wage i.treat, cluster(age)
        return scalar lift = _b[1.treat]
    end
    
    simulate lift = r(lift), reps(10000) dots(10000) saving("mde_sim.dta", replace): my_nlsw88_reg
    
    sum lift, meanonly
    local mean = r(mean)
    _pctile lift, percentile(2.5 97.5)
    return list
    local lb=r(r1)
    local ub=r(r2)
     
    di "MDE is " %-9.3f `lb' "to " %-9.3f `ub'
    
    tw kdensity lift, xline(`lb' `mean' `ub') title("Lift Histogram")
    sum lift, det
    
    /* (4) check SS */
    
    use "my_nlsw88.dta", clear
    replace log_wage = ln((1+`ub'-.001)*wage) if treat==1
    reg log_wage i.treat, cluster(age) // should not be SS
    
    use "my_nlsw88.dta", clear
    replace log_wage = ln((1+`ub'+.001)*wage) if treat==1
    reg log_wage i.treat, cluster(age) // should be SS
    
    /* (5) Clean Up */
    rm "my_nlsw88.dta"
    rm "mde_sim.dta"

    Comment

    Working...
    X