Analysis of Longitudinal Data with a Samll Sample

Avi Jutaru

Join Date: Jul 2016

Posts: 11
#1

Analysis of Longitudinal Data with a Samll Sample

06 Feb 2017, 04:47

Hello all, I would like to ask for your advice regarding data I have. I have a data from a clinical trial, with only 7 subjects (a first in human trial). For each subject, some continuous measure was taken prior a treatment (let's call this outcome Y for convenience). After the treatment was applied, this continuous measure was taken 5 more times (1 day after treatment, 1 week, etc...). If the treatment worked, I expect this measure to be reduced. I need to analyze this small data. I am not sure there is enough power for a longitudinal model. If there was enough power, what would be the best practice ? Is it possible to take the change from baseline as the dependent variable and to put it in a model where the independent variables are the time (not continuous as the gaps are not equal) and the measure at baseline ? My fear is that the difference could be affected by the magnitude of the measure at baseline. If this is legal, what is the Stata command to preform this? My variables are: Y, Y_CFB (change from baseline), Baseline, Visit (the time), ID.

Alternatively, is it possible to perform for each time separately a paired t-test of a Wilcoxon signed rank test? This will not take into account the value at baseline, but since a model might not be powered, there is a chance that this will work.

I have to say, the differences, especially at the first post-treatment time, are large, therefore I do expect to see significant results.

One more question if I may. I tried plotting in Stata the data, like a Spaghetti chart. I did it from the panel data menu. I want to add an averaged line, is it possible ?

I am working with Stata 14.

Thank you in advance !
Tags: panel data
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

06 Feb 2017, 08:45

Hello Avi,

Welcome to the Stata Forum.

I fear some questions of yours demand more information, and data.

That said, with 7 individuals and 6 repreated measures, the results will surely depend on the magnitude of difference and its trend, among other aspects.

Shall you have difficulties with a - mixed - model, for example, I suggest you also consider performing a hierachical Bayesian model and see what happens.

Best regards,

Marcos
Comment
Avi Jutaru

Join Date: Jul 2016

Posts: 11
#3

06 Feb 2017, 23:29

I will try to be a bit more specific, with an example. Let's say that the continuous measure is blood pressure. For each subject, I have the blood pressure prior the treatment, and the blood pressure on several other time points post treatment. With the limitations of my small sample, I wish to show that there is a reduction in the blood pressure. One day after the treatment, the difference (absolute) was 17 with a standard deviation of 7. After 1 week it was 8 with a standard deviation of 8, etc...

My problems are:

1. If I run a paired test (parametric or non parametric), by time point, I get significant results for all time points but one. However, this analysis ignores the possibility that the pressure at baseline may affect the difference. A model could account for the pressure level at baseline. A test doesn't. How bad is it to ignore it ? I may not have a choice. On the other hand, when does a paired test suitable then ? It will always ignore the baseline. Another issue with multiple tests, is that I won't fix the family wise error, not with N=7.

2. If I go for a model, to solve problem '1', then firstly, I am not sure I ran the correct model in Stata. I tried using the menus, but not sure I did it correctly. I tried comparing the change from baseline (DV) against the pressure at baseline and the time point (each subject has 6 measurements including baseline). I will need help with the correct command. If what I did is correct, then the p-values I got were around 0.06. I have no doubt this is because of lack of power. In addition to problem of the correct syntax and model, is it correct at all to use the change from baseline as DV and include the baseline as an IV ?

3. Marcos, you have suggested a Bayesian model, can you please be more specific ? How does that help me in this problem of having only 7 subject ?

Any advice will be most appreciated.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#4

06 Feb 2017, 23:44

Avi:
with 7 subjects only, any inference sounds very weak.
If you could increase your sample size, a possibe option would be -xtreg-.

Kind regards,
Carlo
(Stata 19.0)
Comment
Avi Jutaru

Join Date: Jul 2016

Posts: 11
#5

07 Feb 2017, 00:16

Hello Carlo, thank you for your response. I am aware of the fact that the sample size is very small. It's not in my hands to increase it. My question has a double meaning to me, firstly, to try and solve this problem, and secondly, to learn for future cases with more than 7 subjects. If I had more, what would be the best approach ? To model the response as it is, accounting for the baseline? Or perhaps the differences, while accounting for the baseline ? Should I enter the time into the model as an IV, or is it enough that I tell Stata that the data is panel data ?

Just how bad is it in this case to perform "simple" tests (e.g. Wilcoxon signed rank test) ?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#6

07 Feb 2017, 01:29

Avi:
if, as I can see from your last post, you cannot increase your sample, I would rule out inference from my analysis.
If, in the future, if you plan to analysis a longitudinal study, panel data analysis is a possible option: just -xtset- your data before any regression and Stata will be aware that you're dealing with a paenl dataset.
If you have panel data, both parametric and non parametric test aimed at exploring possible difference in location parameters fail to consider that you have multiple (i.e., non-independent) observations for the same panel units (i.e., the data waves the panel dataset is composed of).

Kind regards,
Carlo
(Stata 19.0)
Comment
Avi Jutaru

Join Date: Jul 2016

Posts: 11
#7

07 Feb 2017, 01:42

I see, so once I declare to Stata that the data is panel, I will get the right model.

Once theoretical question if I may, I got slightly confused. If I have the time variables, which in this case, is the IV of interest, and the categories vary from baseline and on, is it legal to include this variable in the model as an IV (including the baseline category) AND to adjust for the baseline as another covariate ? I mean, the most interesting question is the change from baseline, and less the change from one time to another (although interesting as well). If I include the data at baseline, how can I add the baseline as a standalone variable to the model, doesn't it conflict ?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#8

07 Feb 2017, 01:46

Avi:
thanks for providing further details.
Another approach would be -differences in differences- (please, see http://www.princeton.edu/~otorres/DID101.pdf).

Kind regards,
Carlo
(Stata 19.0)
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#9

07 Feb 2017, 02:04

You could do a bunch of paired t-tests. You might want to do that, anyway, for individual comparisons to baseline, even if you fitted an omnibus model first.

For the average line plot, you could do something like the following.

Code:

anova mbp pid time, repeated(time) scalar define df = min(1, e(hf1)) * e(df_r) contrast r.time, df(`=df') noeffects mcompare(bonferroni) quietly margins time marginsplot , plotopts(lcolor(black) ylabel( , angle(horizontal) nogrid)) level(50)

where pid is patient ID (numeric), mbp is the blood pressure value and time if the time variable (say, 0 for pretreatment, 1 for the following day, 2 for one week, . . ., 5 for whatever).
Comment
Avi Jutaru

Join Date: Jul 2016

Posts: 11
#10

07 Feb 2017, 02:15

Thank you Joseph. Is there a reason why the CI are 50% ?
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#11

07 Feb 2017, 02:43

It helps thwart the misinterpretations that 95% CIs entice, but mostly for esthetics. Other reasons have been given.
Comment

Announcement

Analysis of Longitudinal Data with a Samll Sample

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment