simulation based power analysis for linear mixed model/multilevel model

CEdward

Join Date: Nov 2014
Posts: 131

#61

14 Nov 2019, 18:17

Thank you Joseph.

After much headache (for the time being) and patient support from Clyde, Joseph, and others, here is the final code that executes:

Code:

capture program drop swcrt
program define swcrt, rclass
        version 15.1
        preserve
        clear
        args num_clus clussize intercept timecoeff1 timecoeff2 timecoeff3 timecoeff4 timecoeff5 intrvcoeff sigma_u3 sigma_u2 sigma_error alpha
        assert `num_clus' > 0 & `clussize' > 0 & `intercept' > 0 & `timecoeff1' < 0 & `timecoeff2' < 0 & `timecoeff3' < 0 & `timecoeff4' < 0 & `timecoeff5' < 0 & `intrvcoeff' > 0 & `sigma_u3' > 0 & `sigma_u2' > 0 & `sigma_error' > 0
    /*Generate simulated multi—level data*/
        qui
                set seed 12345
                clear
                set obs `num_clus'  
                qui gen cluster = _n
                //Generate cluster-level errors //
                qui gen u_3 = rnormal(0,`sigma_u3')  
                expand `clussize'
                bysort cluster: gen individual = _n
                //Generate patient-level errors //
                qui gen u_2 = rnormal(0,`sigma_u2')
                expand 6
                //Set up time//
                bysort cluster individual: gen time = _n-1
                //Set up intervention variable//
                gen intrv = (time>=cluster)
                //Generate residual errors
                qui gen error = rnormal(0,`sigma_error')
                //Generate outcome y
                qui gen y = `intercept' + `timecoeff1'*1.time + `timecoeff2'*2.time + `timecoeff3'*3.time + `timecoeff4'*4.time + `timecoeff5'*5.time + `intrvcoeff'*intrv + u_3 + u_2 + error
    
    /*Fit multi-level model to simulated dataset*/
    mixed y intrv i.time ||cluster: ||individual:, covariance(exchangeable) reml dfmethod(kroger)
    /*Return estimated effect size, bias, p-value, and significance dichotomy*/
    tempname M
    matrix `M' = r(table)
    return scalar bias = _b[intrv] - `intrvcoeff'
    return scalar p = `M'[1,4]
    return scalar p_= (`M'[1,4] < `alpha')
    exit
end swcrt
 
*Postfile to store results
tempname step
tempfile powerresults
capture postutil clear //*Closes open postfiles*//
postfile `step' num_clus intrvcoeff p p_ bias using `powerresults', replace //*Declare variable names and filename of dataset where results will be saved*//
*Loop over number of clusters, ICC
foreach c of local num_clus{
    display as text "Number of clusters" as result "`c'"
    /*foreach icc of local ICC{
        display as text "Intra-class correlation" as result `i'*/
        forvalue i = 1/`nrep'{
            display as text "Iterations" as result `nrep'    
            quietly swcrt `c' `clussize' `intercept' `timecoeff1' `timecoeff2' `timecoeff3' `timecoeff4' `timecoeff5' `intrvcoeff' `sigma_u3' `sigma_u2' `sigma_error' `alpha'
            post `step' (`c') (`intrvcoeff') /*(`icc')*/ (`r(p)') (`r(p_)') (`r(bias)') //*Adds new observation to declared dataset; # of variables here must match the postfile command*//
        }
    /*}*/
}
postclose `step' //*Declares end to posting of observations*//

*Open results, calculate power
use `powerresults', clear //*Loads new dataset with posted results*//

levelsof num_clus, local(num_clus)
/*levelsof ICC, local(ICC)*/ /*Maybe remove or add ICC*/
matrix drop _all
*Loop over combinations of clusters and ICC
*Add power results to matrix
foreach c of local num_clus {
    /*foreach i of local ICC { */    
        quietly ci proportions p_ if num_clus == `c' /*& float(ICC) == float(`i')*/
        local power `r(proportion)'
        local power_lb `r(lb)'
        local power_ub `r(ub)'
        quietly ci mean bias if num_clus == `c' /*& float(ICC) == float(`i')*/
        local bias `r(mean)'
        local bias_lb `r(lb)'
        local bias_ub `r(ub)'
        matrix M = nullmat(M) \ (`c', `intrvcoeff', `power', `power_lb', `power_ub', `bias', `bias_lb', `bias_ub') //* \ is a column join operator to be used with nullmat()*//
    /*}*/
}

*Display the matrix
matrix colnames M = c intrvcoeff power power_lb power_ub bias bias_lb bias_ub //*Arguments should match those in the previous line to avoid conformability errors*//
matrix list M, noheader format(%3.2f)

I still have yet to determine how to actually draw values for the parameters used in the simulation from a distribution, but this is a welcome introduction to advanced coding for a newbie like myself.

For the issue below, I also thought I ran into the same problem initially on a mac. In my case it was a convergence issue due to small sample size. The code eventually did run.

Originally posted by Filipe Rodrigues View Post

Dear all

I've run the code provided by @Clyde on his second intervention on this thread (post #6) and even reducing the number of repetitions to 100 or less after around 500 simulations Stata seems to freeze.
The same has happened to me in the past when using bootstrapping in loops or with other simulations even using the simulate command (happy to share code). I've waited hours and nothing seems to happen although the Stata spinning wheel continues to spin. No error is provided and the only thing I get is a frozen screen showing something like this for hours:

Code:

...
Arm size 320
Delta 0.45
Delta 0.50
Delta 0.55
Delta 0.60
Arm size 325
Delta 0.45
Delta 0.50

I am running StataIC 15 in a MacBook Pro 2.7 GHz Intel Core i7 16 GB 2133 MHz LPDDR3 and I set segmentsize to 3Gb.

Does anyone has any idea why this happens?

Thanks!

PS: also in cases like this as loops progress then tend to become slower and slower

Last edited by CEdward; 14 Nov 2019, 18:20.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#62

15 Nov 2019, 13:23

Thank you for showing your solution.
Comment

CEdward

Join Date: Nov 2014
Posts: 131

#63

07 May 2020, 17:41

After spending some time away from the code, I came back to re-run it with the addition of looping over cluster sizes and I run into the same issue that I had previously. Although I did add some elements to the code (bolded below) - those same issues appeared both before and after.

Code:

local num_clus 3 6 9 18 36
local clussize 5 10 15 20 25 
*Model specifications
local intercept 17.87
local timecoeff1 -5.42
local timecoeff2 -5.72
local timecoeff3 -7.03
local timecoeff4 -6.13
local timecoeff5 -9.13
local intrvcoeff 5.00
local sigma_u3 25.77
local sigma_u2 120.62
local sigma_error 38.35
local nrep 1000
local alpha 0.05

*Generate multi-level data
capture program drop swcrt
program define swcrt, rclass
        version 15.1
        preserve
        clear
        args num_clus clussize intercept intrvcoeff timecoeff1 timecoeff2 timecoeff3 timecoeff4 timecoeff5 sigma_u3 sigma_error alpha
        assert `num_clus' > 0 & `clussize' > 0 & `intercept' > 0 & `intrvcoeff' > 0 &  `timecoeff1' < 0 & `timecoeff2' < 0 & `timecoeff3' < 0 & `timecoeff4' < 0 & `timecoeff5' < 0 & `sigma_u3' > 0 & `sigma_error' > 0 & `alpha' > 0
    /*Generate simulated multi—level data*/
        qui
                clear
                set obs `num_clus'  
                qui gen cluster = _n
                qui gen group = 1+mod(_n-1,4)
                /*Generate cluster-level errors*/
                qui gen u_3 = rnormal(0,`sigma_u3')  
                expand `clussize'
                bysort cluster: gen individual = _n
                /*Set up time*/
                expand 6
                bysort cluster individual: gen time = _n-1
                /*Set up intervention variable*/
                gen intrv = (time>=group)
                /*Generate residual errors*/
                qui gen error = rnormal(0,`sigma_error')
                /*Generate outcome y*/
                qui gen y = `intercept' + `intrvcoeff'*intrv + `timecoeff1'*1.time + `timecoeff2'*2.time + `timecoeff3'*3.time + `timecoeff4'*4.time + `timecoeff5'*5.time  + u_3 + error
    
    /*Fit multi-level model to simulated dataset*/
    mixed y intrv i.time ||cluster:, covariance(unstructured) reml dfmethod(kroger)
        
    /*Return estimated effect size, bias, p-value, and significance dichotomy*/
    tempname M
    matrix `M' = r(table)
    return scalar bias = _b[intrv] - `intrvcoeff'
    return scalar p = `M'[1,4]
    return scalar p_= (`M'[1,4] < `alpha')
    exit
end swcrt
 
*Postfile to store results
tempname step
tempfile powerresults
capture postutil clear
postfile `step' num_clus clussize intrvcoeff p p_ bias using `powerresults', replace
*Loop over number of clusters
foreach c of local num_clus{
    display as text "Number of clusters" as result "`c'"
      foreach s of local clussize{
            display as text "Cluster size" as result "`s'"
                forvalue i = 1/`nrep'{
                    display as text "Iterations" as result `nrep'    
                    quietly swcrt `num_clus' `clussize' `intercept' `intrvcoeff' `timecoeff1' `timecoeff2' `timecoeff3' `timecoeff4' `timecoeff5' `sigma_u3' `sigma_error' `alpha'
                    post `step' (`c') (`s') (`intrvcoeff') (`r(p)') (`r(p_)') (`r(bias)')
                }

        }

}        
postclose `step'

*Open results, calculate power
use `powerresults', clear

levelsof num_clus, local(num_clus)
levelsof clussize, local(clussize)
matrix drop _all
*Loop over combinations of clusters
*Add power results to matrix
foreach c of local num_clus{
    foreach s of local clussize{
        quietly ci proportions p_ if num_clus == `c'  & clussize = `s'
        local power `r(proportion)'
        local power_lb `r(lb)'
        local power_ub `r(ub)'
        quietly ci mean bias if num_clus == `c' & clussize = `s'
        local bias `r(mean)'
        matrix M = nullmat(M) \ (`c', `s', `intrvcoeff', `power', `power_lb', `power_ub', `bias')
    }
}
    
*Display the matrix
matrix colnames M = c s intrvcoeff power power_lb power_ub bias /*Arguments should match those in the previous line to avoid conformability errors*/
matrix list M, noheader format(%3.2f)

Here is the error that Stata throws me:

Code:

.
. *Postfile to store results

. tempname step

. tempfile powerresults

. capture postutil clear

. postfile `step' num_clus clussize intrvcoeff p p_ bias using `powerresults', replace
(note: file /var/folders/v4/j5kzzhc52q9fvh6w9pcx9fgm0000gn/T//S_00310.00000c not found)

. *Loop over number of clusters

. foreach c of local num_clus{
  2.     display as text "Number of clusters" as result "`c'"
  3. foreach s of local clussize{
  4. display as text "Cluster size" as result "`s'"
  5. forvalue i = 1/`nrep'{
  6. display as text "Iterations" as result `nrep'
  7. quietly swcrt `num_clus' `clussize' `intercept' `intrvcoeff' `timecoeff1' `timecoeff2' `timecoeff3' `timecoeff4' `timecoeff5' `sigma_
> u3' `sigma_error' `alpha'
  8. post `step' (`c') (`s') (`intrvcoeff') (`r(p)') (`r(p_)') (`r(bias)')
  9. }
 10.
. }
 11.
. }
Number of clusters3
Cluster size5
Iterations1000
r(9);

. postclose `step'

.
.
.

matrix colnames M = c s intrvcoeff power power_lb power_ub bias /*Arguments should match those in the previous line to avoid conformabil
> ity errors*/
matrix M not found
r(111);

. matrix list M, noheader format(%3.2f)
matrix M not found
r(111);

Given previous posts, I have checked the following parts of the code to try and resolve the issue:

Checked to see that the local macros are specified outside of the program and passed into it via the args statement of the program
Checked for a match between the arguments in the call of the program swcrt to determine if they match
Checked to see if the assert statement in the program matches the args command and whether the alligator clips are specified appropriately
Checked for a match b/w the number of variables in the post and postfile commands

I am not quite sure why I get these errors considering that the code did work previously and the program iterated (even when I take away the changes there is still the error). Does anyone know why this happens? If I had to guess, the matrix can't be found because of the error with the file not being found when I use postfile.

Last edited by CEdward; 07 May 2020, 17:50.

Comment

CEdward

Join Date: Nov 2014

Posts: 131
#64

12 Aug 2020, 14:57

I have a question unrelated to the posts above (problem now solved).

I am running the simulations and Stata is iterating through 1000 repetitions and what I notice is that the program has taken several days (and it is still running). I haven't gotten any code breaks or error messages so it is hard for me to tell what's happening since it hasn't reached the conclusion of the simulation. Is this normal?

Does Stata ever keep running without acknowledging any issues?

Version 15.1.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#65

12 Aug 2020, 15:41

Yes, this can happen. If you are doing a thousand replications, and if each replication involves a lot of computation, things can go on a long time. Usually when I write simulations that I expect to take a long time, I usually include command to display a message after every 10 replications or something like that, just so I can reassure myself that progress is in fact being made.

FWIW the longest I ever had a Stata program run (successfully) without any output other than my programmed progress report was 26 days. I don't really recommend planning for things like that to happen because even today, most desktop computer operating systems and computers aren't stable enough to run that long without having to reboot. I was very lucky that time.
Comment
CEdward

Join Date: Nov 2014

Posts: 131
#66

13 Aug 2020, 09:18

Originally posted by Clyde Schechter View Post

Yes, this can happen. If you are doing a thousand replications, and if each replication involves a lot of computation, things can go on a long time. Usually when I write simulations that I expect to take a long time, I usually include command to display a message after every 10 replications or something like that, just so I can reassure myself that progress is in fact being made.

FWIW the longest I ever had a Stata program run (successfully) without any output other than my programmed progress report was 26 days. I don't really recommend planning for things like that to happen because even today, most desktop computer operating systems and computers aren't stable enough to run that long without having to reboot. I was very lucky that time.

Thank you Clyde. That is informative and good to know. Hopefully this doesn't run nearly as long as a month - that just becomes troublesome for timelines.

Does it mean anything if I have set trace on and the results screen was initially showing the computations and now it is white/all blank but the program seems to be iterating still (as evidenced by the spinning wheel in the bottom right of the Stata interface)?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#67

13 Aug 2020, 10:03

If you have trace on and you aren't getting output, that suggests that no calculations are taking place. But you can't be sure. For example, if you are doing a power analysis simulation, and if the analyis under study is one that has a "Refining starting values" stage, then that can run for a long time (days!) with no output even under trace. I don't think the spinning wheel is that useful an indicator. I have seen it continue to spin in circumstances where I'm pretty sure Stata had simply hung (though I can't be certain about that).

If you are running on a Windows box, you can open the Task Manager and see what that shows. If Stata is actively calculating something you should see fluctuations in the amount of CPU it uses, and perhaps fluctuations in memory as well.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment