Statistical Test Selection Advice

Barry Horgan

Join Date: May 2020

Posts: 12
#1

Statistical Test Selection Advice

08 Nov 2020, 12:36

Hi, If possible, I'd like to obtain advice regarding the correct / best statistical test to use for a certain study design. A summary of the brief study design is as follows:

"This study used a randomised controlled cross-over pre-post study design in which eighteen (n=18) strength-trained participants undertook a 12-week (3 interventions x 4 weeks) whole-body strength training programme (refer to attached .png files as visual of study design). Participants were randomly distributed to groups that performed all 3 treatments conditions in order to complete the study. Participants remained in the same post-exercise intervention strategy i.e. cold (CWI) or hot water immersion (HWI), or a control (CON) condition for the duration of each 4-week block, and cross-over thereafter between interventions in order to complete the study."

Previously, I have been advised to apply Linear Mixed Modelling, or Factorial / Repeated Measures ANOVA for this study design. As there is some missing data, I understand that LMM handles missing data better than ANOVA, and as such I will proceed using LMM.

As I used a pre-post study design, the post-test of 1 block, was simultaneously the pre-test for the next block, and so forth. In terms of my data layout and analysis, should I treat time as a repeated measure using absolute raw values, or should I calculate the difference (post-pre) and assign that raw difference to a particular treatment intervention?

In addition to independent variables (treatment), I would like to incorporate 2 covariates into the model to help explain the outcomes.

Sample code that I will use is as follows:

Code:

xtmixed yvar xvar c.covar1##i.covar2 || _all: R.id || trt: , reml nolog

where yvar is dependent variable; xvar is independent variable; covar 1 is continuous covariate; covar2 is categorical covariate;

As this was a random cross-over study design, how or should I factor in the order the participants moved through the intervention? Is this what I need to understand as being either crossed or nested?

Appreciate any advice you can provide.
Thanks, Barry
Attached Files
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29990
#2

08 Nov 2020, 13:15

I think there is something wrong with your diagram at the bottom. While the middle row shows Group 2 progressing through three treatment assignments, CON, CWI, and HWI, the top and bottom rows show a change of group either in the middle (Group 3 suddenly becomes Group 2) or at the end (Group 1 suddenly becomes Group 2). I assume your intent is that each row represents the progression of a single group over time. Also, in your textual description you state that the post-test for block 1 is the pre-test for block 2, etc. But your diagram implies that they are different.

18 subjects is really very small for this complex a design. You have three treatments administered in three different orders, for a total of 9 treatment#order interactions. So you only have two observations for each of those--and those are crucial for the interpretation of your results. If this is preliminary data to be used to plan a future, larger study, then I suppose this is OK. But I think you will have trouble convincing people that your results are truly persuasive.

That said, I would build this model out of the following ingredients:

A three level treatment-condition variable, call it condition: CON, CWI, HWI
A three level order variable, call it group: 1, 2, 3 (group is identical to order in your design).
I would use the measured post-test outcome variables, not the changes, as the dependent variables in the model. Call this variable post_test
I would include the Block 1 pre-test result as a covariate in the analysis. Call this variable block_1_pre_test.
An individual participant identifier, id.
The inclusion of other covariates is optional--as this is a randomized treatment trial, it is not strictly speaking necessary, and your Type I error rate is correct even without covariates. Nevertheless, in a trial this small, the probability that the randomized groups are severely imbalanced is not negligible, and your results will have less noise if you include the covariates.

The model I would use is:

Code:

mixed post_test i.condition##i.group c.block_1_pre_test /*optionally other covariates*/ || id:

I would definitely not use treatment assignment as a separate level in the model. First of all, you only have three treatments, and doing an effective N of three analysis is a waste of time--you are not even remotely getting an adequate sampling of "treatment space." Second of all, there is nothing in your description of these treatments to suggest that they are some kind of random sample from a larger "treatment space." So we have no reason to make it a random effects level and a good reason not to.

Note that from version 13 on, -xtmixed- was renamed to -mixed-. So unless you are still using version 12, you should use the modern name. The old name still works, but someday perhaps it won't. Better to form current habits.

I have not specified the -reml- option. I'll leave it to you whether you prefer that or -ml-. Similarly, I usually like to see the iteration log so that I can tell that the estimation is progressing during execution, and if it goes badly, so I can see where it first ran into trouble. But if you are confident that your model will smoothly converge, I suppose -nolog- is OK. (Then again, it's very hard to be confident a model with this many parameters and so little data will converge smoothly.)
Comment
Tom Scott

Join Date: Apr 2019

Posts: 266
#3

08 Nov 2020, 13:35

Clyde Schechter is it a bad idea to just do a separate anova for each block comparing the mean pre-test-post-test change score between treatment groups? In the second two anovas you would control for the pre-treatment blocks each participant experienced.
Comment
Barry Horgan

Join Date: May 2020

Posts: 12
#4

08 Nov 2020, 13:40

Clyde, thank-you for the very detailed answer above. As a brief reply, I've addressed the typo's in the bottom diagram, with the updated version attached here. I will study and reply to your answers shortly in due course. Thanks again for your advice and the quick reply, Barry
Attached Files
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29990
#5

08 Nov 2020, 14:15

@Tim Scott Re #3. There are two reasons I wouldn't use the approach you suggest there. The first is that I don't like change scores as an outcome variable. Sometimes they work just fine--and maybe this will be one of them. But there are a lot of things that can go wrong. See https://www.fharrell.com/post/errmed/#change. Using the pre-test as a covariate and making the post-test the outcome variable avoids almost all of those difficulties. Also, the block-by-block approach doesn't allow for any lingering effect of the baseline pre-test score once you reach blocks B and C.

The other reason is that it doesn't capture order effects. The effect of CWI vs CON may depend on where in the time sequence HWI fell.

Having said the latter, I realize that the model I proposed in #2 doesn't encode that in a friendly manner. So I would revise what I said there have a time period variable (call it Block) with three levels A, B, and C, and then do the analysis as:

Code:

mixed post_test i.condition##i.Block c.block_1_pre_test /*optionally other covariates*/ || id:

Note: The approach in #2 would still work--because of the 1-1 correspondence between group assignments and treatment orders, these two models are just different parameterizations of the same underlying model. However, with the parameterization proposed here it is straightforward to read order effects from the outcome, whereas in the other parameterization a bunch of -lincoms- would be necessary to get at them.
Comment
Barry Horgan

Join Date: May 2020

Posts: 12
#6

09 Nov 2020, 12:31

Clyde Schechter Re #2. First of all, thank-you very much for your consideration and advice above.
In your post, it states

So you only have two observations for each of those--and those are crucial for the interpretation of your results.

. Can you explain or clarify which two observations you are referring to please? Are you referring to 'condition' and 'group' here?

For

/*optionally other covariates*/

, if using these, would you list them as standalone without any interactions e.g. i.optional_covar1 c.optional_covar2 etc.

Thanks again, Barry
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29990
#7

09 Nov 2020, 12:49

1. Yes, there are 9 combinations of condition and group, and with a total of 18 participants, you have only two participants in each such combination.

2. In theory, and with ample data, this would depend on whether you believe there are clinically meaningful interaction effects (or suspect there are strongly enough to go hunting for them). If you have no reason based on science, previous literature, or your mechanistic understanding of the data generating process to think there would be interactions among these covariates or between the covariates and other effects in the model, then there is no reason to do them as anything other than "standalone." (I like that way of saying it!) However, in the real world, we have to consider that interaction effects are always very underpowered compared to the corresponding standalone effects--you need between 4 and 16 times as many observations to detect an interaction as you do to just use its components in an adequately powered way. Also, interactions chew up degrees of freedom very rapidly, so throwing them in haphazardly quickly brings your model into an overfitting situation (or exhausts the available degrees of freedom altogether). Finally, I will reiterate my concern that it is questionable whether your data set is large enough to support even the bare bones model with no covariates at all. If it turns out to be, and if you can get away with including the covariates as standalone effects on top of that, count yourself lucky. I would be flabbergasted if you could do a meaningful analysis throwing in interactions on top of all that when you have so little data.
1 like
Comment
Barry Horgan

Join Date: May 2020

Posts: 12
#8

19 Aug 2021, 20:10

Originally posted by Clyde Schechter View Post

I would use the measured post-test outcome variables, not the changes, as the dependent variables in the model. Call this variable post_test
I would include the Block 1 pre-test result as a covariate in the analysis. Call this variable block_1_pre_test.

Hi Clyde Schechter , apologies for coming back to this query.

Firstly, to help with background info, as this study used elite athlete participants, it is not uncommon in the sport science discipline to conduct studies with a small n. In saying this I understand your concerns regarding the sample size.

However, my reason for returning to this query, is to understand whether this model (or a different model) could handle this analysis without specifying the 'Block 1 pre-test' as a covariate in the analysis?

In the original code from #5 above, this would leave the revised code (below), which I propose may be used to compare the post-test outcome measures only, without the 'Block 1 pre-test' baseline:

Code:

mixed post_test i.condition##i.Block /*optionally other covariates*/ || id:

Given our study design is a randomised, controlled, cross-over design (see #4), would this statistical analysis approach stand up to a peer-reviewed publication process in your opinion?

If not, is there a different approach that would be more statistically robust in this situation?

Thanks, Barry
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29990
#9

19 Aug 2021, 21:09

Well, you can forget about the optionally other covariates part, in light of the sample size. The absence of the pre-intervention score weakens the analysis considerably, but is not necessarily fatal. Also, notwithstanding my original admonition not to use change scores, under certain strict assumptions (see https://www.fharrell.com/post/errmed/#change) they can be used--and if your outcome score meets those assumptions, that would then be a better alternative than just the post-test score.

In the literature I generally publish in (the general medical literature) I think you would stand little chance of getting this kind of study published. But if, as you say, studies of elite athletes are inherently limited in size, then that would presumably be known in your field, and reviewers there would take that into account. So if there are niche journals that deal with this kind of research, then I think you could give it a try.
Comment

Announcement

Statistical Test Selection Advice

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment