Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Pre Post Data

    Hello,

    I have a dataset of pre and post intervention data. I have an example below. The subject id number is the "id" variable, the timepoint of 1 is the pre intervention data, and the timepoint of two is the post intervention data. We also have the group assignment (treatment or control), and then many other variables of interest. I know I am going to want to do some paired t-tests comparing the pre and post intervention data for each group and some two sample t tests comparing pre and post intervention data between the groups. To do this, I am wondering if I need to alter the format of the data so that each ID number has only one row instead of one row for timepoint 1 and 1 row for timepoint 2. Is this correct thinking? If so, how do I do this? I know to put the data for one person into one row I will need to make a T1 and T2 version for most of my variables, for example var1_T1 and var1_T2. However I have no idea how to do this. I have about 350 variables so I would like to find a shorthand way to do it. Thanks so much!
    id timepoint group All other variables
    1 1 treatment
    1 2 treatment
    2 1 control
    2 2 control

  • #2
    If you are going to do paired t-tests, then you will indeed need to -reshape- your data to the wide layout. It would look like this:
    Code:
    rename (var1 var2 /*etc*/)  =_T
    reshape wide*_T , i(id) j(timepoint)
    The -reshape- command is one of Stata's best and most important data management commands. You should definitely read the manual section on it: it has plenty of worked examples and detailed explanation. It takes some practice to get really used to picking out the right -i()- and -j()- options to do what you want, but once you get the hang of it it's actually quite a simple command to use.

    That said, I don't think what you propose is the best way to analyze this kind of data. You can combine the within and between comparisons into a single analysis:

    Code:
    encode group, gen(Group) // MAKE A NUMERIC VERSION OF GROUP
    foreach v of varlist var1 var2 /*etc*/ {
        mixed `v' i.Group##i.i.timepoint || id:
        margins Group#timepoint
        margins Group#timepoint, pwcompare
    }
    The -mixed- command will do a random-intercepts model with Group X time interaction and incorporates both the within-id comparisons over time and the between-group comparisons at each time. The coefficient of the Group#timepoint interaction term in the -mixed- output is the estimate of the treatment effect. The first -margins- command will show you the model's predicted values in each group at each time, and the second one will do all pairwise comparisons among them.

    All of that said, the wholesale analysis of 350 different outcome variables sounds like a noise-mining exercise to me. Don't you have some theoretical basis for focusing on some small number of outcomes that are scientifically expected to be affected by the treatment? If you're just hunting for variables that will give you "statistically significant" results bear in mind that doing 350 such variables, each at the 0.05 level on a set of purely random numbers with no structure, you will, on average, turn up around 17 or 18 such "results." Worse yet, selecting to report outcomes on the basis of this kind of approach, when there is no scientific basis for expecting a real effect and the actual likely effect, if any, is small, it can be shown that your "statistically significant" findings actually have a very high probability of being grossly exaggerated, and even have a substantial probability of having their signs in the wrong direction.

    Comment

    Working...
    X