Stepped wedge cluster randomised trial analysis

Elena Mvk

Join Date: May 2018

Posts: 15
#1

Stepped wedge cluster randomised trial analysis

08 Jan 2020, 21:41

I was wondering if anyone could help and/or direct on how to approach the data from a stepped-wedge cluster RCT.

I have a basic cohort stepped wedge trial data, with 3 groups of participants (clusters) and 4 data point collections (baseline, 3mths, 6 mths, 9mths)
Group/Time Baseline 3mnth 6mnth 9mnth

1 0 1 1 1

2 0 0 1 1

3 0 0 0 1

Each cluster enters the intervention phase at a sequential data point, starting with Group 1 receiving the intervention at 3 months and continuing to receive it until the end of the data collection.

1. I would like to estimate the effects on outcomes (i.e. ED presentations, use of health services, quality of life results) by comparing the Control and Intervention groups

2. The final aim is to conduct a cost-effectiveness analysis.

I think my first hurdle is in not understanding the Control and Intervention groups. How are these groups identified, as
eventually, all participants receive the Intervention.

What is the first step to take?

Thank you.
Elena
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30115
#2

08 Jan 2020, 22:04

In a stepped-wedge design there are no such things as the control and intervention groups. There are control and intervention conditions and each group experiences both conditions, though at different times. The key to analyzing a stepped wedge design is that each unit of analysis has an observation at each time period, and in that observation there are variables designating the group, the time, whether the group is in treatment condition or control condition at that time, and the outcome at that time. (There may be other variables as well, but these are the crucial ones.) In the analysis you regress the outcome(s) against all of those variables. The estimator of the intervention effect is given by the coefficient of the variable that indicates whether the group is in treatment condition or control condition. The group indicator variables account for persistent differences among the groups, and the time variables account for any secular trends.

The final aim of conducting a cost effectiveness analysis does not alter the way you would go about estimating the intervention effects. In fact the cost-effectiveness part is basically a post-processing of that analysis.
1 like
Comment
Elena Mvk

Join Date: May 2018

Posts: 15
#3

09 Jan 2020, 17:03

Thank you for your prompt reply Mr Schechter. If I may, I would like to ask a few more questions:

1. If I understand correctly, the estimation of intervention effects has to be done by each time period, in my example:
baseline vs 3mths,
baseline vs 6 mths,
3mnths vs 6 mths,
baseline vs 9mths,
3mths vs 9mths,
6mths vs 9mths? - OR am I overcomplicating this?

I saw some published results where the analysis was reported for each time period, as well as results presented as overall Total (Intervention phase vs Control phase) as in van Leeuwen, K.M., Bosmans, J.E., Jansen, A.P., Hoogendijk, E.O., Muntinga, M.E., van Hout, H.P., Nijpels, G., van der Horst, H.E. and van Tulder, M.W., 2015. Cost‐effectiveness of a chronic care model for frail older adults in primary care: economic evaluation alongside a stepped‐wedge cluster‐randomized trial. Journal of the American Geriatrics Society, 63(12), pp.2494-2504.[ https://www.researchgate.net/profile...ized-Trial.pdf ]

2. You wrote:

Originally posted by Clyde Schechter View Post

The key to analyzing a stepped wedge design is that each unit of analysis has an observation at each time period, and in that observation there are variables designating the group, the time, whether the group is in treatment condition or control condition at that time, and the outcome at that time. (There may be other variables as well, but these are the crucial ones.).

In my data set I have:
Individual respondent [ID]
Group variable [1,2,3]
Outcome_ServiceUse_Baseline [coont variable, number of services used]
Outcome_ServiceUse_3months [count varaible]
Outcome_ServiceUse_6months [count varaible]
Outcome_ServiceUse_9months [count varaible]
Treatment_baseline [0 for all]
Treatmetn_3months[1=Group 1, 0= for Groups 2 and 3]
Treatment_6months [ 1 = Group1 and 2; 0 = group 3]
Treatment_9months [1 for all ]

The data is in wide form, should it be in a LONG form?
I saw that for analysis a multilevel mixed-effects linear regression is used (Stata command: mixed), and I saw the command examples you gave in this discussion https://www.statalist.org/forums/for...trial-analysis

mixed math cond##time || school: || student:

would my exaple be:

mixed Outcome_ServiceUse treatment##time || Group: || ID:

Variables- treatment would take a range of variables 0 = control; 1 = treatment in time 3 months; 2 = treatment at 6months; 4=treatment at 9 months time: 0=baseline; 1=3m; 2=6m; 4=9m Group = the cluster groupvariable and for ID - i will have several observation per individual.
Thank you.
Elena

Last edited by Elena Mvk; 09 Jan 2020, 17:05. Reason: some lines run into each other, so I formatted them.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30115
#4

10 Jan 2020, 11:51

I think you are over-complicating this. Your approach makes sense if you believe that there is no homogeneous average treatment effect applicable to all units at all times. In that case, you need to estimate a separate treatment effect for those introduced to treatment at 3_months, another for those introduced to treatment at 6 months. But usually we work from the assumption that there is a single homogeneous average treatment effect that applies to all units at all times. In that case you should not have 4 separate treatment variables. You should have only one variable, let's call it has_begun_treatment:

Code:

gen has_begun_treatment = 0 if time == 0 replace has_begun_treatment = 1 if time == 3 & group == 1 /// | time == 6 & inlist(group, 1, 2) | time == 9

Your treatment effect estimator will be the coefficient of has_begun_treatment when it appears in a model that looks something like this:

Code:

mixed outcome i.time i.group i.has_begun_treatment || ID:

In the above codes I assume that time = 0 at baseline, 3 at 3 months, 6, at 6 months and 9 at 9 months.

Note that I do not make group a random-effects level in the model. You can do that, but with only three groups you are not really sampling any sort of group space adequately and the random effect estimates are likely to be too imprecise to be of any use. I have modeled time as discrete, although depending on what you really expect the trajectory of your outcome(s) to be over time, modeling it as a continuous variable may be reasonable.

And yes, your data definitely needs to be in long form for this analysis. Actually, in Stata, long form works better for almost everything and your workflow should generally begin by creating a long layout data set, and only convert it to wide in the unusual circumstance where that is actually necessary. Wide data sets only work well with a limited number of Stata commands. For this particular analysis, long layout is absolutely necessary; it cannot be done in wide layout in Stata.
Comment
Elena Mvk

Join Date: May 2018

Posts: 15
#5

10 Jan 2020, 14:19

Thank you for your help Mr Schechter, I have adjusted the dataset and was able to run the command.

However, the has_begun_treatment variable was omitted due to multicollinearity.

Code:

mixed ED i.TimePoint i.Group i.has_begun_treatment || PtCode: note: 1.has_begun_treatment omitted because of collinearity

Could you please give a suggestion on how to fix this or what a potential issue is?

Thank you.
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30115

10 Jan 2020, 15:09

Oops, sorry, yes. I forgot about that problem. Take out i.Group and add a || Group: level to the model (before || PtCode .

Added: Wait, no, that shouldn't happen. Something is wrong with the variables here. See the following demonstration that the variables should not have a collinearity problem:

Code:

. clear*

.
. set obs 3
number of observations (_N) was 0, now 3

. gen group = _n

. expand 4
(9 observations created)

. by group, sort: gen time = (_n-1)*3

.
. gen byte has_been_treated = 0

. replace has_been_treated = 1 if time == 3 & group == 1 ///
>     | time == 6 & inlist(group, 1, 2) | time == 9
(6 real changes made)

.    
. table group time, c(mean has_been_treated)

----------------------------------
          |          time        
    group |    0     3     6     9
----------+-----------------------
        1 |    0     1     1     1
        2 |    0     0     1     1
        3 |    0     0     0     1
----------------------------------

.
. set seed 1234

. gen outcome = rnormal()

.
. regress outcome i.group i.time i.has_been_treated

      Source |       SS           df       MS      Number of obs   =        12
-------------+----------------------------------   F(6, 5)         =      2.21
       Model |   6.4569498         6   1.0761583   Prob > F        =    0.2013
    Residual |  2.43700823         5  .487401646   R-squared       =    0.7260
-------------+----------------------------------   Adj R-squared   =    0.3972
       Total |  8.89395803        11  .808541639   Root MSE        =    .69814

------------------------------------------------------------------------------------
           outcome |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------------+----------------------------------------------------------------
             group |
                2  |   .9116368   .5293921     1.72   0.146    -.4492088    2.272482
                3  |  -.0530952   .6244368    -0.09   0.936    -1.658261    1.552071
                   |
              time |
                3  |  -1.464372   .6244368    -2.35   0.066    -3.069538    .1407936
                6  |  -.2677543   .7647758    -0.35   0.741    -2.233673    1.698164
                9  |  -.8572409    .953843    -0.90   0.410    -3.309172     1.59469
                   |
1.has_been_treated |   .5830199   .7647758     0.76   0.480    -1.382899    2.548939
             _cons |  -.2868045   .5293921    -0.54   0.611     -1.64765    1.074041
-----------------------------------------------------------------------------------

Run

Code:

table Group Time_Point, c(min has_been_treated max has_been_treated)

to see what is going on. Your table should look like mine, but the collinearity says it won't.

Last edited by Clyde Schechter; 10 Jan 2020, 15:21.

Comment

Elena Mvk

Join Date: May 2018
Posts: 15

10 Jan 2020, 15:22

Thank you. I tried that, but it still says :

Code:

 mixed ED i.TimePoint i.has_begun_treatment ||Group: || PtCode:
note: 1.has_begun_treatment omitted because of collinearity

Could it be that my the variables are incorrect?
The full result:

Code:

 mixed ED i.TimePoint i.has_begun_treatment ||Group: || PtCode:
note: 1.has_begun_treatment omitted because of collinearity

Performing EM optimization: 

Performing gradient-based optimization: 

Iteration 0:   log likelihood = -316.86386  
Iteration 1:   log likelihood = -316.68069  
Iteration 2:   log likelihood = -316.67403  
Iteration 3:   log likelihood = -316.67403  

Computing standard errors:

Mixed-effects ML regression                     Number of obs     =        295

-------------------------------------------------------------
                |     No. of       Observations per Group
 Group Variable |     Groups    Minimum    Average    Maximum
----------------+--------------------------------------------
          Group |          2         51      147.5        244
         PtCode |        131          1        2.3          4
-------------------------------------------------------------

                                                Wald chi2(3)      =       0.83
Log likelihood = -316.67403                     Prob > chi2       =     0.8413

---------------------------------------------------------------------------------------
                   ED |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
----------------------+----------------------------------------------------------------
            TimePoint |
                   2  |  -.0277516   .1193259    -0.23   0.816     -.261626    .2061228
                   3  |  -.0881946   .1086838    -0.81   0.417     -.301211    .1248217
                   4  |  -.0756946   .1086838    -0.70   0.486     -.288711    .1373217
                      |
1.has_begun_treatment |          0  (omitted)
                _cons |   .2798479   .0796294     3.51   0.000     .1237771    .4359188
---------------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
Group: Identity              |
                  var(_cons) |   5.52e-16   2.64e-12             0           .
-----------------------------+------------------------------------------------
PtCode: Identity             |
                  var(_cons) |   .0993391   .0439452      .0417416    .2364129
-----------------------------+------------------------------------------------
               var(Residual) |   .4172207   .0461515       .335899    .5182303
------------------------------------------------------------------------------
LR test vs. linear model: chi2(2) = 6.99                  Prob > chi2 = 0.0303

Note: LR test is conservative and provided only for reference.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30115
#8

10 Jan 2020, 16:40

Yes it seems your variables are incorrect. Please re-read fully my response in #6. I retracted my suggestion to switch from i.Group to || Group:. There should not be collinearity among TimePoint, Group and has_begun_treatment, as what I show in the code box demonstrates. Do run

Code:

table Group TimePoint, c(min has_begun_treatment max has_begun_treatment)

to see what is going wrong. If I have understood the design, where everybody is untreated at time 0, group 1 gets treated at 3 weeks, groups 2starts treatment at 6 weeks, and then at 9 weeks everybody is under treatment, your results should look like this:

Code:

. table group time, c(min has_begun_treatment max has_begun_treatment) ---------------------------------- | time group | 0 3 6 9 ----------+----------------------- 1 | 0 1 1 1 | 0 1 1 1 | 2 | 0 0 1 1 | 0 0 1 1 | 3 | 0 0 0 1 | 0 0 0 1 ----------------------------------

But given that you have has_begun_treatment as colinear with TimePoint, this will not be the case in your data.
1 like
Comment

Elena Mvk

Join Date: May 2018
Posts: 15

12 Jan 2020, 17:50

Thank you Mr Schechter, it worked. I just started over and followed and compared everything to your example.

The result:

Code:

. table Step TimePoint, c(min has_been_treated max has_been_treated )

----------------------------------
Step of   |
transitio |
n into    |
intervent |
ion       |       TimePoint       
(1,2,3)   |    0     3     6     9
----------+-----------------------
        1 |    0     1     1     1
          |    0     1     1     1
          | 
        2 |    0     0     1     1
          |    0     0     1     1
          | 
        3 |    0     0     0     1
          |    0     0     0     1
----------------------------------

. mixed ED i.TimePoint i.has_been_treated ||Step: || PtCode :

Performing EM optimization: 

Performing gradient-based optimization: 

Iteration 0:   log likelihood = -336.54891  
Iteration 1:   log likelihood = -336.49484  
Iteration 2:   log likelihood = -336.49476  
Iteration 3:   log likelihood = -336.49476  

Computing standard errors:

Mixed-effects ML regression                     Number of obs     =        320

-------------------------------------------------------------
                |     No. of       Observations per Group
 Group Variable |     Groups    Minimum    Average    Maximum
----------------+--------------------------------------------
           Step |          3        100      106.7        116
         PtCode |         80          4        4.0          4
-------------------------------------------------------------

                                                Wald chi2(4)      =       2.17
Log likelihood = -336.49476                     Prob > chi2       =     0.7039

------------------------------------------------------------------------------------
                ED |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------------+----------------------------------------------------------------
         TimePoint |
                3  |  -.0204776   .1082977    -0.19   0.850    -.2327373    .1917821
                6  |  -.1500437   .1283512    -1.17   0.242    -.4016074    .1015199
                9  |  -.1659727    .154297    -1.08   0.282    -.4683893    .1364439
                   |
1.has_been_treated |   .0909727   .1179262     0.77   0.440    -.1401583    .3221037
             _cons |       .275   .0798418     3.44   0.001     .1185129    .4314871
------------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
Step: Identity               |
                  var(_cons) |   2.71e-22          .             .           .
-----------------------------+------------------------------------------------
PtCode: Identity             |
                  var(_cons) |   .1139376   .0348911      .0625178    .2076494
-----------------------------+------------------------------------------------
               var(Residual) |   .3960395   .0361614      .3311444    .4736523
------------------------------------------------------------------------------
LR test vs. linear model: chi2(2) = 19.60                 Prob > chi2 = 0.0001

Note: LR test is conservative and provided only for reference.

Could you please recommend a paper or an online post that can help with reporting (format, table) these results and interpreting them?

Thank you.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30115
#10

12 Jan 2020, 17:55

https://www.bmj.com/content/350/bmj.h391
Comment
Elena Mvk

Join Date: May 2018

Posts: 15
#11

12 Jan 2020, 20:03

Thank you,

I used Hemming et al, trying to understand the data and how to start the analysis.
I am searching for more practical guidance: the interpretation of the Stata results and useful/necessary post estimation analysis.

Thank you again for all your help with understanding the model and my data.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30115
#12

13 Jan 2020, 11:15

Hmm, I don't know of any references dealing specifically with those aspects of it. All my references on stepped-wedge are similar to the Hemming et al. paper in terms of what they cover. I don't know if this represents a gap in the literature, or just a gap in my reading and search skills. Sorry I can't be of more help here.
Comment

Elena Mvk

Join Date: May 2018
Posts: 15

#13

13 Jan 2020, 15:29

Your help was very timely and elaborate.

I am now trying to understand how to interpret the results of the regression. And have not seen examples of Stata result being transferred into an appropriate report format.

For example, the results I received are below, but what/how do I report these, I don't know.

Code:

------------------------------------------------------------------------------------
                ED |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------------+----------------------------------------------------------------
         TimePoint |
                3  |  -.0204776   .1082977    -0.19   0.850    -.2327373    .1917821
                6  |  -.1500437   .1283512    -1.17   0.242    -.4016074    .1015199
                9  |  -.1659727    .154297    -1.08   0.282    -.4683893    .1364439
                   |
1.has_been_treated |   .0909727   .1179262     0.77   0.440    -.1401583    .3221037
             _cons |       .275   .0798418     3.44   0.001     .1185129    .4314871
------------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
Step: Identity               |
                  var(_cons) |   2.71e-22          .             .           .
-----------------------------+------------------------------------------------
RID: Identity                |
                  var(_cons) |   .1139376   .0348911      .0625178    .2076494
-----------------------------+------------------------------------------------
               var(Residual) |   .3960395   .0361614      .3311444    .4736523
------------------------------------------------------------------------------
LR test vs. linear model: chi2(2) = 19.60                 Prob > chi2 = 0.0001

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30115
#14

13 Jan 2020, 15:48

In terms of specifically reporting the end results, the "pay dirt" here is that treatment is associated with an increase of 0.09 in the value of ED, 95% CI (-0.14 to +0.32).
As for interpreting this, I would say that while the best estimate of the treatment effect is appreciable, being about 1/3 of the baseline mean value of ED (.275), the data do not provide a very precise estimate of the effect, not even precise enough to determine whether the effect is substantially positive or substantially negative. This is probably attributable to the very large variation both at the RID level and the variation of ED within individual RIDs over time, as manifested in the large corresponding variances (0.114 and 0.396, respectively). Future studies should use a less noisy outcome measure, or a include repeat outcome measurements within each step of the wedge or include more RIDs (or some combination of these) to obtain a sharper estimate of the intervention effect.

In addition to reporting that "bottom line" result, I would also show descriptive statistics for ED in each group at each time point, something like the output of this:

Code:

table Step TimePoint, c(N ED mean ED sd ED) format(%3.2f)
Comment

Elena Mvk

Join Date: May 2018
Posts: 15

#15

14 Jan 2020, 20:38

Thank you again. This was very helpful. I think I finally understood what and how I need to report. I thought to present the results in a table like the one below ( with more rows for other outcomes).

Would such table 'make sense' ? You suggested that I also report on the mean and variance of _cons, but I am not sure how to incorporate these.

At this point, I could say that "there were no statistically significant differences in ED Presentations and Hospital Discharge outcomes between control and intervention phases." And then I would proceed to analyse the mean diference in costs (control vs intervention). With the aim of then doing a cost-effectiveness analysis.

Outcome	Control phase n=156, Mean (St. err)	Treatment phase n=164, mean (St. Err)	Adjusted Mean Difference (95%CI)
ED Presentations	0.25 (0.05)	0.23 (0.06)	0.122 (-0.124, 0.368)
HOSPital Disch	0.30 (0.08)	0.23 (0.06)	-0.069 (-0.322, 0.184)

The Stata commands for the ED presentation results.

Code:

. ttest ED_Presentation, by ( has_been_treated )

Two-sample t test with equal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
       0 |     156         .25    .0542831    .6779951      .14277      .35723
       1 |     164    .2256098    .0588725    .7539363    .1093586    .3418609
---------+--------------------------------------------------------------------
combined |     320       .2375    .0400761     .716903    .1586532    .3163468
---------+--------------------------------------------------------------------
    diff |            .0243902    .0802916               -.1335795      .18236
------------------------------------------------------------------------------
    diff = mean(0) - mean(1)                                      t =   0.3038
Ho: diff = 0                                     degrees of freedom =      318

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.6192         Pr(|T| > |t|) = 0.7615          Pr(T > t) = 0.3808

And the multilevel analysis was Adjusted for the time of entry into the treatment phase

Code:

. mixed ED_Presentation i.has_been_treated i.TimePoint || GP_cluster : || RID:

Performing EM optimization: 

Performing gradient-based optimization: 

Iteration 0:   log likelihood = -334.86625  
Iteration 1:   log likelihood = -334.86072  
Iteration 2:   log likelihood = -334.86072  

Computing standard errors:

Mixed-effects ML regression                     Number of obs     =        320

-------------------------------------------------------------
                |     No. of       Observations per Group
 Group Variable |     Groups    Minimum    Average    Maximum
----------------+--------------------------------------------
     GP_cluster |         14          4       22.9         36
            RID |         80          4        4.0          4
-------------------------------------------------------------

                                                Wald chi2(4)      =       2.52
Log likelihood = -334.86072                     Prob > chi2       =     0.6409

------------------------------------------------------------------------------------
   ED_Presentation |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------------+----------------------------------------------------------------
1.has_been_treated |   .1217916   .1254613     0.97   0.332    -.1241081    .3676913
                   |
         TimePoint |
                3  |  -.0316495   .1093919    -0.29   0.772    -.2460537    .1827548
                6  |  -.1712317   .1316741    -1.30   0.193    -.4293081    .0868447
                9  |  -.1967916    .160121    -1.23   0.219     -.510623    .1170398
                   |
             _cons |   .3034539     .09579     3.17   0.002     .1157089    .4911988
------------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
GP_cluster: Identity         |
                  var(_cons) |   .0413467   .0329722      .0086625    .1973501
-----------------------------+------------------------------------------------
RID: Identity                |
                  var(_cons) |   .0784447   .0328363      .0345349    .1781842
-----------------------------+------------------------------------------------
               var(Residual) |   .3959276   .0361432      .3310639    .4734998
------------------------------------------------------------------------------
LR test vs. linear model: chi2(2) = 22.87                 Prob > chi2 = 0.0000

Note: LR test is conservative and provided only for reference.

Group/Time	Baseline	3mnth	6mnth	9mnth
1	0	1	1	1
2	0	0	1	1
3	0	0	0	1

Announcement