Dear all,
I have a dataset organised in four levels (L=4):
• Geographical areas
• Households
• Individuals
• Activity
Each observation (row in the dataset) corresponds to an activity.
Each individual may have undertaken a different number of activities (or even no activity - in this case the activity-related attributes are missing values).
Households may contain at least one person.
Areas contain households.
Note that there are unique correspondences between these entities (i.e. an individual cannot belong to more than one household, a household belongs to not more than one area).
Each of these entities is associated to attributes (for example, household income refer to households and does not vary within the same household; activity duration is an attribute of an activity etc.).
A simplified and hypothetical representation of the dataset would be:
Area household household_income person activity_duration
1 1 2000 1 15
1 1 2000 1 20
1 2 2500 1 5
1 2 2500 2 10
1 2 2500 2 15
1 3 1500 1 35
1 3 1500 1 40
1 3 1500 1 10
1 4 6000 1 5
… … … … …
If I am interested in getting some basic descriptive statistics on activity duration, then I can run for instance tabstat and it would be fine.
But if I am interested in analysing household income, running the same command will be misleading as Stata assumes that the unit of analysis is the activity. The average of household income in this simplified example with four households would be 2444 (and not 3000 – which would be the expected result). Shortly: How can I calculate statistics on income having households as unit of analysis (UoA) - i.e. counting each household only once?
I would like to avoid transforming the dataset with the command reshape – because in this case I had to do it L-1 times for datasets containing L entity levels level of analysis.
Apologies if this question had already appeared. The closest forum entry I found was this one (http://www.statalist.org/forums/foru...y-nested-group), but I am not sure if it provides a straightforward answer to my question. I was expecting something more like tabstat household_income, uoa(household) stats(mean) if such command options were available.
Thanks in advance,
Thiago
I have a dataset organised in four levels (L=4):
• Geographical areas
• Households
• Individuals
• Activity
Each observation (row in the dataset) corresponds to an activity.
Each individual may have undertaken a different number of activities (or even no activity - in this case the activity-related attributes are missing values).
Households may contain at least one person.
Areas contain households.
Note that there are unique correspondences between these entities (i.e. an individual cannot belong to more than one household, a household belongs to not more than one area).
Each of these entities is associated to attributes (for example, household income refer to households and does not vary within the same household; activity duration is an attribute of an activity etc.).
A simplified and hypothetical representation of the dataset would be:
Area household household_income person activity_duration
1 1 2000 1 15
1 1 2000 1 20
1 2 2500 1 5
1 2 2500 2 10
1 2 2500 2 15
1 3 1500 1 35
1 3 1500 1 40
1 3 1500 1 10
1 4 6000 1 5
… … … … …
If I am interested in getting some basic descriptive statistics on activity duration, then I can run for instance tabstat and it would be fine.
But if I am interested in analysing household income, running the same command will be misleading as Stata assumes that the unit of analysis is the activity. The average of household income in this simplified example with four households would be 2444 (and not 3000 – which would be the expected result). Shortly: How can I calculate statistics on income having households as unit of analysis (UoA) - i.e. counting each household only once?
I would like to avoid transforming the dataset with the command reshape – because in this case I had to do it L-1 times for datasets containing L entity levels level of analysis.
Apologies if this question had already appeared. The closest forum entry I found was this one (http://www.statalist.org/forums/foru...y-nested-group), but I am not sure if it provides a straightforward answer to my question. I was expecting something more like tabstat household_income, uoa(household) stats(mean) if such command options were available.
Thanks in advance,
Thiago
Comment