Unexpected finding with simulation

CEdward

Join Date: Nov 2014
Posts: 131

Unexpected finding with simulation

11 May 2021, 10:25

Hi everyone,
So I have this code put together below and it runs perfectly fine, but I noticed an unexpected finding. Now, of course, sometimes unexpected findings are warranted and constitute a new insight in research, but I am inclined to think that this isn't one of those times. This simulation looks at how sample size relates to statistical power generated from models analyzed with generalized estimating equations. The expectation is that power should increase positively with greater sample size, and while that holds true for the first few sample sizes (3 and 6, I notice that the power goes to zero at 9 and this is completely bizarre). I don't see anything wrong with the code, but wondering what else might give rise to this break in trend. Or is it a real finding?

For clarity, here is where I am manipulating the sample size in the code below:

Code:

local num_clus 3 6 9

Code:

capture program drop swcrt
program define swcrt, rclass
        args num_clus clussize intercept intrvcoeff timecoeff1 timecoeff2 timecoeff3 timecoeff4 timecoeff5 sigma_u3 sigma_error alpha
        assert `num_clus' > 0 & `clussize' > 0 & `intercept' > 0 & `intrvcoeff' > 0 & `timecoeff1' < 0 & `timecoeff2' < 0 & `timecoeff3' < 0 & `timecoeff4' < 0 & `timecoeff5' < 0 & `sigma_u3' > 0 & `sigma_error' > 0 & `alpha' > 0
    *Generate simulated multi—level data
        qui
                clear
                set obs `num_clus'  
                qui gen cluster = _n
                qui gen group = 1+mod(_n-1,4)
                *Generate cluster-level errors
                qui gen u_3 = rnormal(0,`sigma_u3')  
                expand `clussize'
                bysort cluster: gen individual = _n
                *Set up time
                expand 6
                bysort cluster individual: gen time = _n-1
                *Set up intervention variable
                gen intrv = (time>=group)
                *Generate residual errors
                qui gen error = rnormal(0,`sigma_error')
                *Generate outcome y
                qui gen y = `intercept' + `intrvcoeff'*intrv + `timecoeff1'*1.time + `timecoeff2'*2.time + `timecoeff3'*3.time + `timecoeff4'*4.time + `timecoeff5'*5.time + u_3 + error
    
    *Fit multi-level model to simulated dataset
    xtgeebcv y intrv i.time, cluster(cluster) family(gaussian) link(identity) corr(exchangeable) stderr(kc)        
    
    *Return estimated effect size, absolute and relative bias, p-value, and significance dichotomy
    tempname M
    matrix `M' = r(table)
    return scalar b = _b[intrv]
    return scalar abias = _b[intrv] - `intrvcoeff'
    return scalar rbias = [(_b[intrv] - `intrvcoeff')/`intrvcoeff']*100
    return scalar p = `M'[4,1]
    return scalar p_= (`M'[4,1] < `alpha')
    exit
end swcrt
 
*Stepped-wedge specifications
local num_clus 3 6 9
local clussize 5 10 15 20 25
*Model specifications
local intercept 17.87
local timecoeff1 -5.42
local timecoeff2 -5.72
local timecoeff3 -7.03
local timecoeff4 -6.13
local timecoeff5 -9.13
local intrvcoeff 5.00
local sigma_u3 25.77
local sigma_u2 120.62
local sigma_error 38.35
local alpha 0.05
local nrep 500

*Postfile to store results
tempname step
tempfile powerresults
capture postutil clear
postfile `step' num_clus clussize intrvcoeff b p p_ abias rbias using `powerresults.dta', replace
*Loop over number of clusters
foreach c of local num_clus{
    display as text "Number of clusters" as result "`c'"
        foreach s of local clussize {
            display as text "Cluster size" as result "`s'"
                forvalue i = 1/`nrep'{
                    display as text "Iterations" as result `nrep'    
                    quietly swcrt `c' `s' `intercept' `intrvcoeff' `timecoeff1' `timecoeff2' `timecoeff3' `timecoeff4' `timecoeff5' `sigma_u3' `sigma_error' `alpha'
                    post `step' (`c') (`s') (`intrvcoeff') (`r(b)') (`r(p)') (`r(p_)') (`r(abias)') (`r(rbias)')
                }

        }

}        
postclose `step'

*Open results, calculate power
use `powerresults', clear

levelsof num_clus, local(num_clus)
levelsof clussize, local(clussize)
matrix drop _all
*Loop over combinations of clusters
*Add power results to matrix
foreach c of local num_clus{
    foreach s of local clussize{
        quietly ci proportions p_ if num_clus == `c' & clussize == `s'
        local power `r(proportion)'
        quietly ci mean abias if num_clus == `c' & clussize == `s'
        local abias `r(mean)'
        quietly ci mean rbias if num_clus == `c' & clussize == `s'
        local rbias `r(mean)'
        matrix M = nullmat(M) \ (`c', `s', `intrvcoeff', `power', `abias',`rbias')
    }
}
    
*Display the matrix
matrix colnames M = c s intrvcoeff power abias rbias
matrix list M, noheader format(%3.2f)

Last edited by CEdward; 11 May 2021, 10:30.

Tags: None

Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2402
#2

11 May 2021, 11:59

Are you certain the model specified by -xtgeebcv- is not having convergence issues (or some other error) ? What happens if you simply use -xtgee, vce(rob)- ?
Comment
CEdward

Join Date: Nov 2014

Posts: 131
#3

11 May 2021, 12:18

Originally posted by Leonardo Guizzetti View Post

Are you certain the model specified by -xtgeebcv- is not having convergence issues (or some other error) ? What happens if you simply use -xtgee, vce(rob)- ?

Fascinating hypothesis. I didn't consider this. I have always assumed that any convergence issues that happen at larger sample sizes would affect all smaller sample sizes, but it seems here that things are fine for said smaller sample sizes. Perhaps this is rhetorical, but I suppose there are times when convergence problems can happen at larger n? Secondly, how would I go about determining whether such errors occurred?

The reason why I am using xtgeebcv is to get the bias corrected SEs, something xtgee does not provide.
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2402
#4

11 May 2021, 13:13

Presumably the larger sample size should be more stable as a general rule of thumb, but I am not familiar with the particular corrections requested by this model.

A usual approach when building these sorts of simulations is to do a dry run of one iteration for the estimation method and inspect the output to see if it's reasonable. In your case, you could simply generate one data set and then try to run your GEE model. The fact that the structure of the simulation seemingly works under most conditions has me thinking that it also works -per se- with 9 clusters, but if there is an undetected estimation or convergence problem, a missing value could be taken as zero power.

The suggestion to use -xtgee- was not to change the estimation method, but to demonstrate to yourself that for the simpler model, the simulation does run and yields sensible values in accordance with your intuition. If it does work using -xtgee-, then it might be a clue that -xtgeebcv- is having a problem.

Originally posted by Jack Chau View Post

Fascinating hypothesis. I didn't consider this. I have always assumed that any convergence issues that happen at larger sample sizes would affect all smaller sample sizes, but it seems here that things are fine for said smaller sample sizes. Perhaps this is rhetorical, but I suppose there are times when convergence problems can happen at larger n? Secondly, how would I go about determining whether such errors occurred?

The reason why I am using xtgeebcv is to get the bias corrected SEs, something xtgee does not provide.
1 like
Comment

CEdward

Join Date: Nov 2014
Posts: 131

11 May 2021, 18:15

So I generated a single dataset and ran the xtgee and xtgeebcv model and both return coefficients:

Code:

. xtgee y intrv i.time, family(gaussian) link(identity) corr(exchangeable)

Iteration 1: tolerance = 7.3892455
Iteration 2: tolerance = .05559544
Iteration 3: tolerance = .00036835
Iteration 4: tolerance = 2.572e-06
Iteration 5: tolerance = 1.796e-08

GEE population-averaged model                   Number of obs     =        270
Group variable:                    cluster      Number of groups  =          9
Link:                             identity      Obs per group:
Family:                           Gaussian                    min =         30
Correlation:                  exchangeable                    avg =       30.0
max =         30
Wald chi2(6)      =      11.73
Scale parameter:                  2572.658      Prob > chi2       =     0.0683


y       Coef.   Std. Err.      z    P>z     [95% Conf. Interval]

intrv    2.251537   8.872424     0.25   0.800    -15.13809    19.64117

time
1    -6.835083    8.74341    -0.78   0.434    -23.97185    10.30169
2    -3.343955   9.591498    -0.35   0.727    -22.14295    15.45504
3    -18.73226   10.73877    -1.74   0.081    -39.77987    2.315351
4    -19.19576   12.10043    -1.59   0.113    -42.91218     4.52065
5    -23.28394   12.10043    -1.92   0.054    -47.00035    .4324741

_cons    27.04052    12.2658     2.20   0.027     3.000003    51.08104

Code:

. xtgeebcv y intrv i.time, cluster(cluster) family(gaussian) link(identity) corr(exchangeable) stderr(kc)
 
Note: Family is gaussian and link is identity
Using exchangeable working correlation
with scale parameter divided by K - p
       panel variable:  cluster (balanced)

Iteration 1: tolerance = 7.3667033
Iteration 2: tolerance = .05769979
Iteration 3: tolerance = .0003891
Iteration 4: tolerance = 2.771e-06
Iteration 5: tolerance = 1.974e-08

GEE population-averaged model                   Number of obs     =        270
Group variable:                    cluster      Number of groups  =          9
Link:                             identity      Obs per group:
Family:                           Gaussian                    min =         30
Correlation:                  exchangeable                    avg =       30.0
                                                              max =         30
                                                Wald chi2(6)      =      11.24
Scale parameter:                  2640.841      Prob > chi2       =     0.0814

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       intrv |   2.189534   9.059979     0.24   0.809     -15.5677    19.94677
             |
        time |
          1  |  -6.814415   8.932021    -0.76   0.446    -24.32085    10.69202
          2  |  -3.309509   9.797702    -0.34   0.736    -22.51265    15.89363
          3  |  -18.68403   10.96886    -1.70   0.088    -40.18261     2.81454
          4  |  -19.13376   12.35896    -1.55   0.122    -43.35687    5.089354
          5  |  -23.22194   12.35896    -1.88   0.060    -47.44505    1.001178
             |
       _cons |   27.04052    12.3331     2.19   0.028     2.868082    51.21296
------------------------------------------------------------------------------
 
Kauermann-Carroll bias-corrected standard errors
t-statistic with K - p degrees of freedom
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       intrv |   2.189534    7.66152     0.29   0.802    -30.77533    35.15439
             |
        time |
          1  |  -6.814415   5.641628    -1.21   0.351    -31.08838    17.45955
          2  |  -3.309509   7.677818    -0.43   0.708    -36.34449    29.72548
          3  |  -18.68403   7.016299    -2.66   0.117    -48.87273    11.50467
          4  |  -19.13376   7.248882    -2.64   0.119    -50.32318    12.05566
          5  |  -23.22194   10.39472    -2.23   0.155    -67.94681    21.50294
             |
       _cons |   27.04052   12.70361     2.13   0.167    -27.61872    81.69977
------------------------------------------------------------------------------

It's hard to say though which of the other hundreds of iterations are failing/not converging. Is there a way to check this? I suppose you can do "set trace on", but the output is too large for me to see everything in the interface.

Comment

Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2402
#6

11 May 2021, 18:27

Try inspecting what values you save immediately after this line. Maybe set iterations to something small, like 25, and examine just one combination of cluster size and number of clusters at a time.

Code:

*Open results, calculate power use `powerresults', clear
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2402
#7

11 May 2021, 18:33

Also, your name was Jack Chau in your earlier post, but now is CEdward. It is customary on this board to use our real names, and you can request a name change via the "Contact Us" link at the bottom of the page.
Comment
CEdward

Join Date: Nov 2014

Posts: 131
#8

13 May 2021, 10:05

So I did some additional digging and learned some things, but the original 'issue' remains, if in fact it is an 'issue' and not a novel finding.

I took a look at the the generated coefficients and associated p-values for the smallest sample sizes and found that for the vast majority of the iterations, that more than 99% of them did not have a p-value. That leads me to believe there were problems with the models achieving convergence and indeed that was the error that showed up when I do 100 or so iterations. However, the only reason I get power estimates for 1000+ iterations is because every now and then a p-value is returned. So in other words, if you do enough iterations, you will get convergence purely by chance at extremely small sample sizes is my guess.

On the other hand, for all iterations at larger sample sizes, I do get a p-value and I checked the log and found no warnings/errors about convergence. That to me suggests that the models ran fine.

I am still baffled that power is essentially 0 at a larger sample compared to a smaller sample, all else being equal. Are there other estimation problems that may go undetected which I might have missed?
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3456
#9

13 May 2021, 10:59

What are the number of clusters an observations in your larger samples?

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2402
#10

13 May 2021, 14:03

It appears that -xtgeebcv- alternates between using a t-statistic when a positive df can be calculated (df = K-p). With your data, p=5, so you will get t-statistics for k>5, and z-statistics otherwise. This probably explains why you seem to see a threshold/dropoff in power between 5 and 9 clusters, as you switch from Z to T statistics.

You could constrain your simulation to at least 6 clusters, or run them all with z-statistics with the -statistic(z)- option, and I think now in both cases you should see the increase in power with cluster number and size.
Comment
CEdward

Join Date: Nov 2014

Posts: 131
#11

13 May 2021, 14:18

Originally posted by Maarten Buis View Post

What are the number of clusters an observations in your larger samples?

Hi Maarten. The number of people per cluster is 5-25 in increments of 5. The number of clusters is 3,6,9,18,36.

As it stands, 3 clusters generates a power estimate close to 0 (albeit based on very few p-values as mentioned), 6 clusters the power goes to ~20%, 9 clusters it goes back down to 0, then 18 and 36 clusters the power goes up higher again above 20%.
Comment
CEdward

Join Date: Nov 2014

Posts: 131
#12

13 May 2021, 14:56

Originally posted by Leonardo Guizzetti View Post

It appears that -xtgeebcv- alternates between using a t-statistic when a positive df can be calculated (df = K-p). With your data, p=5, so you will get t-statistics for k>5, and z-statistics otherwise. This probably explains why you seem to see a threshold/dropoff in power between 5 and 9 clusters, as you switch from Z to T statistics.

You could constrain your simulation to at least 6 clusters, or run them all with z-statistics with the -statistic(z)- option, and I think now in both cases you should see the increase in power with cluster number and size.

Bingo. This generates power estimates that are in line with the positive trend that I was expecting. Although, the theoretical underpinnings aren't entirely clear to me, i.e. the distinction between using z and t statistics.
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3456
#13

14 May 2021, 09:45

Originally posted by CEdward View Post

The number of people per cluster is 5-25 in increments of 5. The number of clusters is 3,6,9,18, 36

That seems rather small to me. Maybe your simulation shows that you need a much larger number of clusters ans d more observations per cluster to get reasonable power.

Before drawing that conclusion I would increase the number of simulations. For power estimates I would use 20,000 replications as an absolute minimum., but if the results seem unstable I would easily increase that by a factor 10.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
1 like
Comment
CEdward

Join Date: Nov 2014

Posts: 131
#14

14 May 2021, 12:50

Originally posted by Maarten Buis View Post

That seems rather small to me. Maybe your simulation shows that you need a much larger number of clusters ans d more observations per cluster to get reasonable power.

Before drawing that conclusion I would increase the number of simulations. For power estimates I would use 20,000 replications as an absolute minimum., but if the results seem unstable I would easily increase that by a factor 10.

Interesting. Your thoughts led to a number of questions in my mind.

First, I find it interesting that you suggest 20K iterations. In many papers that I have read, 1000 seems to be quite standard, if not upwards of 2000, but certainly nowhere near 20,000. Now, I am not suggesting your idea is off-base, but perhaps you may enlighten me on the standard you suggest. Is this discipline specific? Secondly, at what point can you safely conclude that there is stability in the estimates? I suppose this is analogous to asking what is the minimum # of repetitions that you need without wasting time on doing more ...perhaps a restatement of question 1 in different terms.
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3456
#15

16 May 2021, 13:40

Simulations necesserily include an element of randomness, so the results are uncertain. How uncertain can be quantified in a way analogous to frequentists inference. Most common measure is the Monte Carlo standard error; if you were to run the simualtion many many times with different seeds, then you would get many many estimates of the power. This collection of estimates form a distribution that is analogous to the sampling distribution in normal inference. Just as the standard error is the standard deviation of the sampling distribuiton, so the Monte Carlo standard error is the standard deviation of that Monte Carlo distribuiton. You can use the standard formulas from normal inference to compute it without computing the many many different runs. This is the (very) short version. For a more detailed explanation see this article by Ian White in the Stata Journal: https://www.stata-journal.com/articl...article=st0200 . It also introduces simsum (see search simsum), which is a convenient tool to compute these MCSEs. So, long story short: you decide on the number of replications by looking if the resulting uncertainty in the estimate is small enough.

However, that is not the end of the story. Before looking at power, you first need to determine whether the bias in your estimates is small enough and whether the p-values actually mean what they should mean. Power is meaningless, if the estimate does not measure what it should measure or when the p-value does not measure what it should measure. Again simsum helps (you are looking at bias and coverage). Especially for the coverage you need a large number of replications. Say we are looking at the coverage of the 95% confidence interval, then that estimate is based on the (hopefully) 5% replications that fall outside that interval. For a 1000 replications, that is only 50 replication, for 20,000 replications that is a 1000 replications. A more detailed look at the p-values can be obtained with simpplot (see ssc desc simpplot). Its use is described in this presentation at the 2013 German Stata Users' Group meeting: http://maartenbuis.nl/presentations/gsug13.pdf .

So if I were to do a simulation like this I would take the following steps after getting the basic simulation working:
Search for the sample size at which the estimates have an acceptably low bias

From that sample size search for the sample size at which p-value show acceptable behavior

From that sample size search for the sample size at which you get acceptable power

At each step you also take into accout the Monte Carlo Standard Errors to see if your estimates are reliable enough, and increase the number of replications when necessary. In your case you have two sample sizes: the number of clusters and the (average) number of observations within each cluster. So you have three two-dimensional searches. You can use smaller number of replications to get a rough idea what the interesting values might be, and once you know what area you want to search, you can increase the number of replications to get more reliable estimates.

For such searching I find that running my simulation with simulate is more convenient. You can see a tutorial on how to create a simulation with simulate here: http://maartenbuis.nl/presentations/chicago08.html

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment

Announcement

Unexpected finding with simulation

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment