Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • power calculation for two binary treatments and their interaction in a regression setting

    I will have data from an experiment with two binary treatments (X and Z) and their interaction. The unequal sample sizes are given in this table:
    Z=1 Z=0
    X=1 N=2,500 N=717,500
    X=0 N=2,500 N=77,500
    The baseline rate is Pr(Y=1 | X=0, Z=0) = 4/1000. I am interested in doing inference on the average marginal effects (possibly with some covariates), so I will estimate something like one of these models:

    Code:
    reg y i.x##i.z, robust
    probit y i.x##i.z
    margins, at(x = (0 1) z=(0 1)) contrast(atcontrast(r._at) wald)
    Is there a way to calculate the effect sizes that I can detect with either the OLS or probit/margins, given some alpha and power?

    I tried using -powersim- (version 1.0.1), but I could not get the the block22() option with my uneven split.
    Last edited by Dimitriy V. Masterov; 16 Aug 2017, 22:45. Reason: power calculation

  • #2
    Would something like the following help at least as a starting point?
    Code:
    version 15.0
    
    clear *
    
    // Create dataset with fixed cell counts
    quietly set obs 2
    generate x = _n - 1
    
    quietly expand 2
    bysort x: generate byte z = _n - 1
    
    generate long count = 2500
    quietly {
        replace count = 77500 in 1
        replace count = 717500 in 3
    }
    list, noobs sepby(x z)
    
    tempfile tmpfil0
    quietly save `tmpfil0'
    
    // Simulations using the fixed dataset
    program define simem
        version 15.0
        syntax , File(string) [Effect(real 0)]
    
        drop _all
        use `file'
    
        generate long count1 = rbinomial(count, 0.004)
        replace count1 = rbinomial(count, abs(`effect' - 0.004)) in l
        generate long count0 = count - count1
        drop count
        quietly reshape long count, i(x z) j(y)
    
        probit y i.x##i.z [fweight=count]
        testparm x#z
    end
    
    // Test size
    local reps 1000
    simulate p = r(p), reps(`reps') nodots nolegend: simem , file(`tmpfil0')
    generate byte pos = p < 0.05
    summarize pos, meanonly
    display in smcl as text "Effect = " %5.3f `effect', "Power = " %4.2f r(mean), "Estimable = " %4.2f r(N) / `reps'
    
    // Power
    local reps 100
    forvalues effect = 0.010(0.001)0.02 {
        simulate p = r(p), reps(100) nodots nolegend: simem , file(`tmpfil0') e(`effect')
        generate byte pos = p < 0.05
        summarize pos, meanonly
        display in smcl as text "Effect = " %5.3f `effect', "Power = " %4.2f r(mean), "Estimable = " %4.2f r(N) / `reps'
    }
    
    exit
    Here, "effect" is defined such that the proportion in the cell diagonal to the baseline cell is "effect" minus the baseline rate. Substitute whatever code you need to make "effect" or "effect size" defined as desired. Also, the simulation program tests only the interaction term; substitute -margins- or whatever as desired. Be sure to look for coding errors.
    Last edited by Joseph Coveney; 17 Aug 2017, 00:18.

    Comment


    • #3
      Originally posted by Joseph Coveney View Post
      Be sure to look for coding errors.
      Well, how about we start with a seed?
      Code:
      clear *
      
      set seed 1406723
      And continue by making "Estimable" etc. meaningful
      Code:
      generate byte pos = p < 0.05 if !missing(p)
      and make "effect" a little more intuitive.
      Code:
      replace count1 = rbinomial(count, `effect' + 0.004) in l
      With that last one
      Code:
      forvalues effect = 0.002(0.002)0.012 {
      hones in on the range where things are happening.

      Comment


      • #4
        Thank you for this great code. I think I've digested and adapted it to my -margins, pwcompare- case successfully, but I am uncertain what the estimable part is doing. I understand that there will be some simulations with no p-values, but I am not sure if I understand why that is happening here.

        Comment


        • #5
          The rate (4/1000) is fairly low. If there is perchance a zero-count cell in the table, then you will get separation and the probit regression will fail. In such cases, I expected it to return a missing p-value; the estimable stuff was just to monitor whether it was happening. As it turns out, even with only 2500 total in the Z = 1 / X = 0 condition, it seem to be rare enough that it never happened with the small number of repetitions that I used.

          You could always get around such a case with a combination of probit's asis option and the use of the likelihood-ratio test.

          Comment


          • #6
            Thank you again for a great explanation. Makes a lot of sense! I will post my version of the code when it gets cleaned up.

            Comment

            Working...
            X