Issue with the -collapse- command

Peter Jorensen

Join Date: Nov 2017

Posts: 10
#1

Issue with the -collapse- command

29 Nov 2017, 10:02

Hi everyone,

I have the following issue: let's assume that I have data that consists of:
3 treatments (T1, T2, T3)

20 subjects per treatment (unique identifier SubjectID)

20 periods in which each of the subjects makes a decision X (and other decisions Y Z etc., that are not of relevance here)

Now let's assume that I want to look only at behavior X (= 0 or 1) within this data set and collapse it using collapse X, by(Treatment SubjectID Period)
I use this collapse command because I am interested in this particular behavior across periods (i.e. to generate two-way graphs). Here it is important to note (because this is likely the reason for the problem that I have) that the observations are not evenly distributed across individuals: some individuals might have 20 observations for X (either 0 or 1), others 10, 15 etc (and a missing otherwise). It varies on the individual level.
I also create summary statistics (average behavior of X per treatment) and see something like this: T1: X = 0.85, T2: X = 0.82, T3: X = 0.90.

Next: for the purpose of statistical analysis, I need to create averages per individual across all periods. Hence, the next command that I'm using is collapse X, by(Treatment SubjectID) to collapse the data further and create averages on the individual level across all periods (again, note that the number of observations for X varies on the individual level).

Now the problem: if I do the same summary statistics now, I get something like this: T1 = 0.87, T2 = 0.80, T3 = 0.91.
This can't be due to rounding. I assume this comes from the fact that across the treatments the number of observations per individuals varies. But what the heck is going on? Why do the averages change once I collapse the same data further? I am not using any weights - do I need to use them? What values are the 'correct' ones?

Thanks!

Last edited by Peter Jorensen; 29 Nov 2017, 10:06.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30122
#2

29 Nov 2017, 11:18

I'm a bit confused by your description of the data set. Initially you appear to say that you have 3 treatments crossed with 20 subjects crossed with 20 periods (or perhaps the 20 periods are nested within subjects--it doesn't really matter for present purposes). Then you -collapse X, by(Treatment SubjectID Period)-. Based on the description up to that point, the collapse is vacuous, because Treatment SubjectID and Period should uniquely identify the observations in the original data: there is no other source of variation described. You might as well have just written -keep Treatment SubjectID Period X-, and you'd have the same data set. There is nothing to average over. So I think you have omitted something crucial from your description.

You go on to say that there are different numbers of observations per subject. So what does this mean? Does it mean that some subjects were not observed in all 20 periods? Assuming it's that, you then go on to collapse the data -by(Treatment SubjectID)-, averaging over the (differing numbers of) Period distinctions. When you -summarize- X you find that the mean of these averages no longer matches the overall grand mean of X for the original data. This is precisely due to the different numbers of observations per subject.

If you were to instead do a weighted average, weighting each subject's observations by the number of periods for which he/she contributes data, then that weighted average would match the overall grand mean.

The following toy example illustrates the point:

Code:

clear set obs 5 gen id = (_n >= 3) + 1 gen x = _n list, noobs sepby(id) summ x collapse (count) N = x (mean) x, by(id) summ x summ x [fweight = N]

If I have completely misunderstood your data, feel free to disregard this.
Comment
Peter Jorensen

Join Date: Nov 2017

Posts: 10
#3

29 Nov 2017, 11:33

Sorry for not having been more precise in my description. Please let me respond in detail.

Originally posted by Clyde Schechter View Post

I'm a bit confused by your description of the data set. Initially you appear to say that you have 3 treatments crossed with 20 subjects crossed with 20 periods (or perhaps the 20 periods are nested within subjects--it doesn't really matter for present purposes). Then you -collapse X, by(Treatment SubjectID Period)-. Based on the description up to that point, the collapse is vacuous, because Treatment SubjectID and Period should uniquely identify the observations in the original data: there is no other source of variation described. You might as well have just written -keep Treatment SubjectID Period X-, and you'd have the same data set. There is nothing to average over. So I think you have omitted something crucial from your description.

I assume you are correct: keep or collapse would yield the same result since at that point nothing has been averaged yet.

Originally posted by Clyde Schechter View Post

You go on to say that there are different numbers of observations per subject. So what does this mean? Does it mean that some subjects were not observed in all 20 periods? Assuming it's that, you then go on to collapse the data -by(Treatment SubjectID)-, averaging over the (differing numbers of) Period distinctions. When you -summarize- X you find that the mean of these averages no longer matches the overall grand mean of X for the original data. This is precisely due to the different numbers of observations per subject.

The data depicts 3x20x20 data points. However, note that not all 20 observations exist for all 20 subjects. Before collapsing I drop all the missing observations so that the data set now only continues actual values for X. The number of non-missing values varies across subjects (some subjects had no missings, some have 50% missings etc.). Hence, once collapsed using "collapse X, by(Treatment SubjectID Period)", some subjects don't have all periods 1-20. Collapsing the data now even further using "collapse X, by(Treatment SubjectID)" now yields a different mean than before. The difference arise between "collapse X, by(Treatment SubjectID Period)" and "collapse X, by(Treatment SubjectID)", so it must have something to do with the missing weighted average below that you mention. I need to figure out how this applies to my data set.

Originally posted by Clyde Schechter View Post

If you were to instead do a weighted average, weighting each subject's observations by the number of periods for which he/she contributes data, then that weighted average would match the overall grand mean.

The following toy example illustrates the point:

Code:

clear set obs 5 gen id = (_n >= 3) + 1 gen x = _n list, noobs sepby(id) summ x collapse (count) N = x (mean) x, by(id) summ x summ x [fweight = N]

If I have completely misunderstood your data, feel free to disregard this.

Using the example that you posted I get three different averages, so I am not sure what exactly the example is supposed to show:

Code:

summ x Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- x | 5 3 1.581139 1 5 . . . . collapse (count) N = x (mean) x, by(id) . . summ x Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- x | 2 2.75 1.767767 1.5 4 . . summ x [fweight = N] Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- x | 5 3 1.369306 1.5 4

Code:

Following your advice, I now did the following:
1. collapse X, by(Treatment SubjectID Period) <-- at this point, all the missings have already been removed and the observations for X vary by individual. I sum by treatment and observe: T1: X = 0.85, T2: X = 0.82, T3: X = 0.90
2. created the twoway figure that I need
3. bysort SubjectID: egen n = count(X) <--- this now tells me exactly how many observations I have per individual
4. collapse X [fweight=n], by(Treatment SubjectID)

However: the problem still persists. The averages now do not correspond anymore to the previous averages. That is, they are not "T1: X = 0.85, T2: X = 0.82, T3: X = 0.90" but slightly different.

Why?

Last edited by Peter Jorensen; 29 Nov 2017, 12:07.
Comment
Peter Jorensen

Join Date: Nov 2017

Posts: 10
#4

30 Nov 2017, 05:39

No further ideas?
Sorry, if my initial explanation wasn't as clear but the actual problem is very straightforward: between the two "collapse" transformations I only drop the 'Period' variable and suddenly the averages change - I don't know how to fix it and frankly I don't even know which of the two the 'true' average value is.
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10225

30 Nov 2017, 07:24

Code:

input float(id period treatment x)
1 1 1 1
1 2 1 2
2 1 1 4
2 2 1 5
1 3 2 3
2 3 2 6
2 4 2 7
2 5 3 8
end

So your data has the above structure. Look at how you define your groups

Code:

egen tag1 = group(treatment id period), label
egen tag2 = group(treatment id), label

In the first case, each observation is grouped independently because you are using all identifiers. However, excluding period implies that some observations are grouped together

Code:

. list, noobs sepby(tag2)

  +-------------------------------------------+
  | id   period   treatm~t   x    tag1   tag2 |
  |-------------------------------------------|
  |  1        1          1   1   1 1 1    1 1 |
  |  1        2          1   2   1 1 2    1 1 |
  |-------------------------------------------|
  |  2        1          1   4   1 2 1    1 2 |
  |  2        2          1   5   1 2 2    1 2 |
  |-------------------------------------------|
  |  1        3          2   3   2 1 3    2 1 |
  |-------------------------------------------|
  |  2        3          2   6   2 2 3    2 2 |
  |  2        4          2   7   2 2 4    2 2 |
  |-------------------------------------------|
  |  2        5          3   8   3 2 5    3 2 |
  +-------------------------------------------+

For treatment 2, for example, there is 1 observation for id=1 and 2 observations for id=2. What is the average?

Well if we assign equal weights to groups, it is

Code:

. di (0.5*3) + (0.5*6.5)
4.75

If we assign equal weight to observations, it is

Code:

. di ((1/3)*3) + ((1/3)*6) + ((1/3)*7)
5.3333333

Which one do you want? Your call

So to adjust the mean when collapsing using treatment and id, taking the above example, you can specify

Code:

sum x if treatment==2 [fweight = id]

Last edited by Andrew Musau; 30 Nov 2017, 07:29.

Comment

Peter Jorensen

Join Date: Nov 2017

Posts: 10
#6

30 Nov 2017, 07:51

Originally posted by Andrew Musau View Post

Code:

input float(id period treatment x) 1 1 1 1 1 2 1 2 2 1 1 4 2 2 1 5 1 3 2 3 2 3 2 6 2 4 2 7 2 5 3 8 end

So your data has the above structure. Look at how you define your groups

Code:

egen tag1 = group(treatment id period), label egen tag2 = group(treatment id), label

In the first case, each observation is grouped independently because you are using all identifiers. However, excluding period implies that some observations are grouped together

Code:

. list, noobs sepby(tag2) +-------------------------------------------+ | id period treatm~t x tag1 tag2 | |-------------------------------------------| | 1 1 1 1 1 1 1 1 1 | | 1 2 1 2 1 1 2 1 1 | |-------------------------------------------| | 2 1 1 4 1 2 1 1 2 | | 2 2 1 5 1 2 2 1 2 | |-------------------------------------------| | 1 3 2 3 2 1 3 2 1 | |-------------------------------------------| | 2 3 2 6 2 2 3 2 2 | | 2 4 2 7 2 2 4 2 2 | |-------------------------------------------| | 2 5 3 8 3 2 5 3 2 | +-------------------------------------------+

For treatment 2, for example, there is 1 observation for id=1 and 2 observations for id=2. What is the average?

Well if we assign equal weights to groups, it is

Code:

. di (0.5*3) + (0.5*6.5) 4.75

If we assign equal weight to observations, it is

Code:

. di ((1/3)*3) + ((1/3)*6) + ((1/3)*7) 5.3333333

Which one do you want? Your call

So to adjust the mean when collapsing using treatment and id, taking the above example, you can specify

Code:

sum x if treatment==2 [fweight = id]

Thanks for this, very helpful! I definitely understand these things, but what I don't understand is what's going on with the collapse command in my specific case because it seems to mess up my data in unpredictable ways.

Let me clarify: my starting point is the same as the one in your example (the dataset consists only of unique subject ID, treatment, period dummies, and X (whose frequency varies across subjects). When I use your step by step explanation (and also do "egen tag1 = group(SubjectID)", but that shouldn't matter given that the frequency of SubjectID already takes care of this) I derive
one set of averages using unweighted averages of X (let's call it X1) --> command: bysort Treatment: sum X

another set of averages using weighted averages of X (= X2) based on how many observations they contributed per period --> command: bysort Treatment: sum X [fweight=tag1]

but a completely different set of averages when I now first use the collapse command, regardless of whether or not I use the fweight command in the collapse command (X3 = X4):

Code:

(1) collapse X [fweight=tag1], by(Treatment SubjectID) (2) bysort Treatment: sum X

Yields the same averages as

Code:

(1) collapse X, by(Treatment SubjectID) (2) bysort Treatment: sum X

However, none of these averages correspond to either X1 or X2.

This is essentially my problem: I can't get the collapse command to work properly - what am I missing? Is there maybe a bug in the newest STATA?

Last edited by Peter Jorensen; 30 Nov 2017, 07:55.
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10225

30 Nov 2017, 08:04

Hopefully, this is more clear

Code:



. use `data'

. egen tag1 = group(treatment id period), label

. egen tag2 = group(treatment id), label

. bysort treatment: sum x

--------------------------------------------------------------------------------------------------------------------------
-> treatment = 1

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           x |          4           3    1.825742          1          5

--------------------------------------------------------------------------------------------------------------------------
-> treatment = 2

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           x |          3    5.333333    2.081666          3          7

--------------------------------------------------------------------------------------------------------------------------
-> treatment = 3

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           x |          1           8           .          8          8


. collapse x, by(treatment id period)

. bysort treatment: sum x

--------------------------------------------------------------------------------------------------------------------------
-> treatment = 1

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           x |          4           3    1.825742          1          5

--------------------------------------------------------------------------------------------------------------------------
-> treatment = 2

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           x |          3    5.333333    2.081666          3          7

--------------------------------------------------------------------------------------------------------------------------
-> treatment = 3

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           x |          1           8           .          8          8


. clear

. use `data'

. egen tag2 = group(treatment id), label

. bys tag2: egen n= total(1)

. collapse x, by(treatment id n)

. bysort treatment: sum x

--------------------------------------------------------------------------------------------------------------------------
-> treatment = 1

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           x |          2           3     2.12132        1.5        4.5

--------------------------------------------------------------------------------------------------------------------------
-> treatment = 2

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           x |          2        4.75    2.474874          3        6.5

--------------------------------------------------------------------------------------------------------------------------
-> treatment = 3

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           x |          1           8           .          8          8


. bysort treatment: sum x [fweight=n]

--------------------------------------------------------------------------------------------------------------------------
-> treatment = 1

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           x |          4           3    1.732051        1.5        4.5

--------------------------------------------------------------------------------------------------------------------------
-> treatment = 2

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           x |          3    5.333333    2.020726          3        6.5

--------------------------------------------------------------------------------------------------------------------------
-> treatment = 3

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           x |          1           8           .          8          8


. 

.

Last edited by Andrew Musau; 30 Nov 2017, 08:40.

Comment

Peter Jorensen

Join Date: Nov 2017
Posts: 10

30 Nov 2017, 08:52

Here is a test sample of the data that I'm using in the following example.

1. sample averages by treatment without weight:

Code:

 bysort Treatment: sum X

-----------------------------------------------------------------------------------------------
-> Treatment = 0

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           X |        155    .4709677    .5007744          0          1

-----------------------------------------------------------------------------------------------
-> Treatment = 1

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           X |        172    .7790698    .4160849          0          1

-----------------------------------------------------------------------------------------------
-> Treatment = 2

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           X |        162          .5    .5015504          0          1

-----------------------------------------------------------------------------------------------
-> Treatment = 3

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           X |        449    .6035635    .4897027          0          1

2. sample averages by treatment with corresponding weight:

Code:

 egen tag1 = group(SubjectID)
bysort Treatment: sum X [fweight=tag1]

-----------------------------------------------------------------------------------------------
-> Treatment = 0

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           X |      6,789    .4822507    .4997217          0          1

-----------------------------------------------------------------------------------------------
-> Treatment = 1

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           X |        799    .8110138    .3917429          0          1

-----------------------------------------------------------------------------------------------
-> Treatment = 2

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           X |      2,024    .5355731    .4988562          0          1

-----------------------------------------------------------------------------------------------
-> Treatment = 3

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           X |     12,230    .6068684    .4884656          0          1

3. sample averages after using collapse without weight:

Code:

 collapse X, by(Treatment SubjectID)
bysort Treatment: sum X

-----------------------------------------------------------------------------------------------
-> Treatment = 0

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           X |         10    .4322755    .4434984          0          1

-----------------------------------------------------------------------------------------------
-> Treatment = 1

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           X |          8    .7558842    .2829162         .2          1

-----------------------------------------------------------------------------------------------
-> Treatment = 2

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           X |          8    .5131579    .5095734          0          1

-----------------------------------------------------------------------------------------------
-> Treatment = 3

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           X |         22    .5740555    .3560557          0          1

4. sample averages after using collapse with corresponding weight:

Code:

 egen tag1 = group(SubjectID)
collapse X [fweight=tag1], by(Treatment SubjectID)
bysort Treatment: sum X

-> Treatment = 0

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           X |         10    .4322755    .4434984          0          1

-----------------------------------------------------------------------------------------------
-> Treatment = 1

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           X |          8    .7558842    .2829162         .2          1

-----------------------------------------------------------------------------------------------
-> Treatment = 2

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           X |          8    .5131579    .5095734          0          1

-----------------------------------------------------------------------------------------------
-> Treatment = 3

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           X |         22    .5740555    .3560557          0          1

This simply does not make sense to me. What's wrong with the collapse command, or with my application of it?

Comment

Andrew Musau

Join Date: Oct 2014

Posts: 10225
#9

30 Nov 2017, 12:45

What are you trying to do here?

2. sample averages by treatment with corresponding weight:

egen tag1 = group(SubjectID)
bysort Treatment: sum X [fweight=tag1]

The frequency weight gives you the number of duplicated observations. This is not how you count the number of duplicates. See

Code:

help duplicates

Also, the frequency weight is relevant in the collapsed dataset to know the frequency before collapse.
Comment
Peter Jorensen

Join Date: Nov 2017

Posts: 10
#10

30 Nov 2017, 13:46

I tried it the way you explained it above. Still, when I apply the second collapse command (where 'Period') is removed, the average numbers change and I don't know what to do.
Any chance someone would check the linked test data file and tell me what exactly the problem is? I simply don't understand why the average numbers change although all I want to do is to collapse all the period observations properly on the individual level. With or without weights the averages change... I see it because when I do a t-test after the first but before the second collapse command the averages are different from the averages that I get after the second collapse command. The averages after the second collapse command are wrong and I don't know how to fix it...

Last edited by Peter Jorensen; 30 Nov 2017, 13:54.
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10225

#11

30 Nov 2017, 14:31

I tried it the way you explained it above.

I did not do what you do in # 8

2. sample averages by treatment with corresponding weight:

egen tag1 = group(SubjectID)
bysort Treatment: sum X [fweight=tag1]

This is what I did in #7 and I will repeat it only once. Do read carefully my explanation in #5 because I won't repeat it again.

Code:

. use "\\filgrms1\u09$\andrewmm\Downloads\Test.dta"
*\\ GENERAL SUMMARY

. bysort Treatment: sum X

---------------------------------------------------------------------------------------------------------------------------------------------------
-> Treatment = 0

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           X |        155    .4709677    .5007744          0          1

---------------------------------------------------------------------------------------------------------------------------------------------------
-> Treatment = 1

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           X |        172    .7790698    .4160849          0          1

---------------------------------------------------------------------------------------------------------------------------------------------------
-> Treatment = 2

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           X |        162          .5    .5015504          0          1

---------------------------------------------------------------------------------------------------------------------------------------------------
-> Treatment = 3

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           X |        449    .6035635    .4897027          0          1


*\\ IF YOU COLLAPSE BY Treatment & SubjectID, I CREATE A VARIABLE "n" THAT COUNTS HOW MANY OBSERVATIONS ARE IN A GROUP

. egen tag1= group( Treatment SubjectID ), label

. bys tag1: egen n= total(1)

. collapse X, by(Treatment SubjectID n)

. *\\ THIS GIVES YOU AVERAGES ASSIGNING EQUAL WEIGHTS TO MEANS ACROSS GROUPS

. bysort Treatment: sum X

---------------------------------------------------------------------------------------------------------------------------------------------------
-> Treatment = 0

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           X |         10    .4322755    .4434984          0          1

---------------------------------------------------------------------------------------------------------------------------------------------------
-> Treatment = 1

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           X |          8    .7558842    .2829162         .2          1

---------------------------------------------------------------------------------------------------------------------------------------------------
-> Treatment = 2

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           X |          8    .5131579    .5095734          0          1

---------------------------------------------------------------------------------------------------------------------------------------------------
-> Treatment = 3

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           X |         22    .5740555    .3560557          0          1



. *\\ TO RECOVER ORIGINAL OBS AND MEAN, USE THE VARIABLE "n" AS THE FREQUENCY WEIGHT.

. bysort Treatment: sum X [fweight= n]

---------------------------------------------------------------------------------------------------------------------------------------------------
-> Treatment = 0

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           X |        155    .4709677      .43317          0          1

---------------------------------------------------------------------------------------------------------------------------------------------------
-> Treatment = 1

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           X |        172    .7790698    .2447724         .2          1

---------------------------------------------------------------------------------------------------------------------------------------------------
-> Treatment = 2

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           X |        162          .5    .4781008          0          1

---------------------------------------------------------------------------------------------------------------------------------------------------
-> Treatment = 3

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           X |        449    .6035635    .3357815          0          1

At the end of the day, as stated in #5, you decide what average you want. Average assigning equal weight to groups or average assigning equa; weight to observations.

Comment

Peter Jorensen

Join Date: Nov 2017

Posts: 10
#12

30 Nov 2017, 15:08

I have done this already successfully, thanks, but this does not solve my issue: I need to be able to perform test statistics with the means that you get in your example using the weights (.4709677 etc.). In order to perform ranksum, ttest etc, I need this data set to already have those weighted means because I cannot use weights when carrying out those tests. This is exactly what I have been trying to do but cannot achieve. I was able to recreate the sums using your previous descriptions earlier today. I am not just trying to see weighted averages but have the test statistics use these values correctly.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10225
#13

30 Nov 2017, 15:29

Carry the tests on the raw data. If you have a specific problem with running the tests, I suggest that you start a new post explaining your problem and providing a data example which illustrates the problem. As it stands, I do not see why you need to collapse your data in the first place and I have no clue what your specific difficulty is. Go through the FAQs and see how your question can be effectively framed.
Comment
Peter Jorensen

Join Date: Nov 2017

Posts: 10
#14

30 Nov 2017, 15:38

Originally posted by Andrew Musau View Post

Carry the tests on the raw data. If you have a specific problem with running the tests, I suggest that you start a new post explaining your problem and providing a data example which illustrates the problem. As it stands, I do not see why you need to collapse your data in the first place and I have no clue what your specific difficulty is. Go through the FAQs and see how your question can be effectively framed.

No, this is not an econometric issue but an issue with the collapse command:
I have to take care of the fact that the observations are not independent across periods. Hence, all I am trying to do is to collapse the data such that I get the appropriate averages on the individual level where one subject corresponds to exactly one observation.

So you are saying that I cannot do what I want to do using the collapse command? What is an alternative command that would achieve this?
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10225
#15

01 Dec 2017, 02:25

If you need any further input from me, follow the advice in #13 and start a new thread appropriately titled, e.g., how to run a rank sum test with multiple observations per individual. This is a classic example of the xyproblem where you do not ask about your real problem - see FAQ 9. Otherwise you can hope that someone else can be generous enough to guide you with your problem.

9. Where may I look for other advice on posting technical questions?

Asking about your real problem, not something else, may seem too obvious to mention, but do check http://xyproblem.info/.
2 likes
Comment

Announcement