Hi everyone,
I have the following issue: let's assume that I have data that consists of:
I use this collapse command because I am interested in this particular behavior across periods (i.e. to generate two-way graphs). Here it is important to note (because this is likely the reason for the problem that I have) that the observations are not evenly distributed across individuals: some individuals might have 20 observations for X (either 0 or 1), others 10, 15 etc (and a missing otherwise). It varies on the individual level.
I also create summary statistics (average behavior of X per treatment) and see something like this: T1: X = 0.85, T2: X = 0.82, T3: X = 0.90.
Next: for the purpose of statistical analysis, I need to create averages per individual across all periods. Hence, the next command that I'm using is collapse X, by(Treatment SubjectID) to collapse the data further and create averages on the individual level across all periods (again, note that the number of observations for X varies on the individual level).
Now the problem: if I do the same summary statistics now, I get something like this: T1 = 0.87, T2 = 0.80, T3 = 0.91.
This can't be due to rounding. I assume this comes from the fact that across the treatments the number of observations per individuals varies. But what the heck is going on? Why do the averages change once I collapse the same data further? I am not using any weights - do I need to use them? What values are the 'correct' ones?
Thanks!
I have the following issue: let's assume that I have data that consists of:
- 3 treatments (T1, T2, T3)
- 20 subjects per treatment (unique identifier SubjectID)
- 20 periods in which each of the subjects makes a decision X (and other decisions Y Z etc., that are not of relevance here)
I use this collapse command because I am interested in this particular behavior across periods (i.e. to generate two-way graphs). Here it is important to note (because this is likely the reason for the problem that I have) that the observations are not evenly distributed across individuals: some individuals might have 20 observations for X (either 0 or 1), others 10, 15 etc (and a missing otherwise). It varies on the individual level.
I also create summary statistics (average behavior of X per treatment) and see something like this: T1: X = 0.85, T2: X = 0.82, T3: X = 0.90.
Next: for the purpose of statistical analysis, I need to create averages per individual across all periods. Hence, the next command that I'm using is collapse X, by(Treatment SubjectID) to collapse the data further and create averages on the individual level across all periods (again, note that the number of observations for X varies on the individual level).
Now the problem: if I do the same summary statistics now, I get something like this: T1 = 0.87, T2 = 0.80, T3 = 0.91.
This can't be due to rounding. I assume this comes from the fact that across the treatments the number of observations per individuals varies. But what the heck is going on? Why do the averages change once I collapse the same data further? I am not using any weights - do I need to use them? What values are the 'correct' ones?
Thanks!
Comment