Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Power calculation for regression using simulation

    Hi all,

    I have been writing code for a power calculation based on the advice here & I have reached a bit of a standstill. I'm not sure I'm running it correctly & would really appreciate a sense check/new eyes on the code I have written.

    The problem is as follows. An RCT is being planned to compare standard-of-care (DrugA) versus a new drug (DrugB) for the treatment of a disease to assess whether the DrugB causes unwanted weight gain. Weight will be measured at baseline and months 6 and 12. The primary endpoint is the treatment effect of the DrugB at month 12. This will be analysed using the following regression model:

    Yt - Yt0 = b0 + b1X + b2 Yt0

    Where Yt is the weight change from baseline, X = treatment group, b1 = treatment effect and Yt0 is the baseline weight. In a previous exploratory study, mean baseline weight was 80.6 (sd 11). The newer drug gained on average 3.5kg over 12 months, and the SOC gained 0.6kg (meaning a difference of 1.3kg). The SD for change in weight was 1.7 for both groups.

    Using the example code from the other thread as a basis, I have generated the following code to simulate & test various populations. I'm not convinced I have done this correctly - particularly the generation of the post variable & the error variables.

    Code:
    version 14.2
    clear *
    set seed 123
    
    program define simem // , rclass
        version 14.2
        syntax , [n(integer 80) ///
            pre(real 80.6) presd(real 11) ///
            DElta(real 0) deltasd(real 1.7) ///
            Residual(real 1)]
            
        // Add input validation to test
        
        drop _all
        set obs `n'                                                    /* set number of observations */
        generate int id = _n                                        /* create an id variable */
        egen trtgrp = cut(id), group(2)                                /* generate two treatment groups */
        
        generate u = rnormal()                                        /* random intercept for each individual */
        generate e = rnormal(0,`residual')                            /* residual errors */
        
        generate pre = rnormal(`pre', `presd')                        /* generate baseline weight variable */
        summarize pre, meanonly
        generate c_pre = pre - r(mean)                                /* center baseline weight */
        generate post = u + trtgrp*rnormal(`delta',`deltasd') + e    /* generate post weight variable, accounting for variation in delta */
        
        // Test to run
        
        reg post i.trtgrp c_pre
        test 1.trtgrp
        
    end
    
    postfile sim delta power lb ub using regression_powersim, replace
    foreach delta of numlist 1.3 1.7 2.1 2.5 2.9 {
        simulate p = r(p), reps(5) nolegend nodots: simem , de(`delta')
        generate byte pos = p < 0.05
        qui ci means pos
        post sim (`delta') (r(mean)) (r(lb)) (r(ub))
        display in smcl as text "Delta = " %4.2f `delta' " Power = " %4.2f r(mean) " Lower 95%CI = " %4.2f r(lb) " Upper 95%CI = " %4.2f r(ub)
    }
    
    postclose sim
    
    exit

    Secondly, if the primary outcome was the change over time, I would need to include the other timepoints. I would need to generate a longitudinal panel & estimate the following regression model:

    Yt - Yt0 = b0 + b1X + b2Yt0 + b3time + b4X*time

    Where time is binary for month 6 and 12 (0/1). The treatment effect at month 12 is interpretable as b1 + b4. I will run this using either xtreg or the mixed command.

    When constructing my inputs, I can expand the data such that each individual has two data points as follows:

    Code:
    expand 2
    bysort id: generate byte time = _n-1
    But I am unsure how to calculate the outcome variable that is correlated for each individual.

    I really appreciate any help anyone can provide on this topic.

    Best wishes,
    Bryony

  • #2
    I haven't looked at your code, but I have a couple of questions.

    1. If Yt is "the weight change from baseline" and Yt0 is the baseline weight, then what is Yt - Yt0, which is what you're ostensibly modeling according to your equation? Is that a typo in your equation, or is Yt the weight at time t and not the weight change from baseline? Moreover, if it's the weight and not weight change, then it would seem to me that you would be better off with a straightforward ANCOVA and not some kind of hybrid of an ANCOVA with a change-score model. Google change score harrell if you need more on this.

    2. If you're having trouble believing your simulation results, then can't you get power and sample size for a conventional ANCOVA design from official Stata commands? You can use the latter as a sanity check for results that you're getting from the former. (I think that Stata's power & sample size command used to be called -sampsi- or something similar. It might still work under that name, or if not, then Stata's pop-up message will steer you to the command's current incarnation.)

    Comment


    • #3
      I think that there is some literature about when it is advantageous to model the data as ANCOVA and when to use repeated-measures ANOVA. It has to do with the intraclass correlation's being greater or less than 0.5, or something as I recall. The Google search that I mentioned might turn something up about that.

      Toward that end, you might want to explore the two methods of modeling the study's results for which might tend toward greater efficiency in testing your hypothesis. Something like this might be a good starting point.
      Code:
      version 15.1
      
      clear *
      set seed `=strreverse("1458042")'
      
      program define simem, rclass
          version 15.1
          syntax , [n(integer 80) delta1(real 3.5) delta0(real 0.6)]
      
          drop _all
          quietly set obs `n'
      
          generate byte trt = mod(_n, 2)
          generate double y0 = rnormal(80.6, 11)
      
          generate double y12 = y0 + ///
              cond(trt, rnormal(`delta1', 1.7), rnormal(`delta0', 1.7))
      
          // ANCOVA
          anova y12 trt c.y0
          testparm i.trt
          tempname ancova
          scalar define `ancova' = r(p)
      
          // Repeated-measures ANOVA
          generate int pid = _n
          quietly reshape long y, i(pid) j(tim)
          anova y trt / pid|trt tim trt#tim
          testparm i.trt#i.tim
          return scalar rmanova = r(p)
          return scalar ancova = `ancova'
      end
      
      forvalues n = 20(-2)12 {
          quietly simulate ancova = r(ancova) rmanova = r(rmanova), ///
              reps(1000) nodots: simem , n(`n')
      
          display in smcl as text _newline(1) "n = `n'"
      
          foreach var of varlist *a {
              generate byte p_`var' = `var' < 0.05
              summarize p_`var', meanonly
              display in smcl as text strupper("`var'"), "Power = " %04.2f r(mean)
          }
      
      }
      
      exit
      You could readily extend it to include intermediate follow-up observations for secondary analysis, although at that point, you'd be leaving ANCOVA behind.

      Comment

      Working...
      X