Survey: How to estimate per person means from household totals

Steve Samuels

Join Date: Mar 2014

Posts: 1786
#1

Survey: How to estimate per person means from household totals

17 Mar 2015, 11:18

I received the following private query, edited:

Dear Sir:

I’ve got your e-mail address from Statalist. May I ask you one question. I found your explanation about sampling weights somewhere at Statalist (it was as follows:

Each person in a selected household shares the selection probability of the household and gets the same sampling weight.

I have one observation per household, with sampling weights at household level, and I’ve to report the results at individual (specifically per adult equivalent expenditure per annum), but I only have household total expenditure. I’ve found some suggestions to multiply sampling weights with household sizes to analyze the household expenditure data at individual level. May I know your above suggestion is in line with other suggestions.

I have confused that whether I should directly use sampling weights given (believe to be household weights) with svy command or I should use individual weights / population weights.

It's an interesting question, and I replied that I would answer if it were asked on Statalist, I've decided not to wait. (The post referred to was at: http://www.stata.com/statalist/archi.../msg01516.html). The bottom line answer is: use the household weights, and use svy: ratio

One can create a new "individual" weight = hh weight x no. adults, then use svy: mean. However this is unnecessary and undesirable. The following code shows the two approaches.

Code:

clear sysuse auto, clear /* Set up household data with "turn" as the expenditure variable*/ gen hhwt = trunk gen hhid = substr(make,1,2) egen hhexp = total(turn), by(hhid) /* household adult expenditures*/ egen hhsize = count(turn), by(hhid) /* number of . adults */ bys hhid: keep if _n==1 /* one observation per hh */ /* svyset */ svyset hhid [pweight= hhwt] /* Estimate average expenditure per person */ svy: ratio av_adult_exp: hhexp/hhsize Number of strata = 1 Number of obs = 23 Number of PSUs = 23 Population size = 976 Design df = 22 -------------------------------------------------------------- | Linearized | Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------ av_adult_exp | 40.20799 .7347049 38.68431 41.73168

Now a version that utilizes a weight equal to HH weight x HH size:

Code:

/* Four Statements if you want to revise weight */ gen av_adult_exp = hhexp/hhsize /*HH average */ gen new_wt = hhwt*hhsize svyset hhid [pweight= new_wt] svy: mean av_adult_exp

The results are same as the those from svy: ratio.

The second version is not only unnecessary but also undesirable: First, it requires the creation of two extra variables and one extra svyset statement; Second, the analyst will have to explain that the new weight is not a real per-person weight. (If the the study did have incomes for individuals, each would get the household weight, as I said in the earlier post.)

Last edited by Steve Samuels; 17 Mar 2015, 12:04.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Tags: None

2 likes
Lwin Lwin Aung

Join Date: Mar 2015

Posts: 4
#2

18 Mar 2015, 03:10

Dear Steve,

May I know whether I should use household weights or individual weights (household weight*household size) for the following conditions.
Regression analysis: Y is per adult equivalent expenditure per annum or per capita expenditure per annum

svyset township [pweight=???], strata(district) vce(linearized) singleunit(certainty)
svy: reg
Gini Coefficient

svyset township [pweight=???], strata(district) vce(linearized) singleunit(certainty)
svylorenz

According to Poverty and Inequality Handbook published by the World Bank < http://elibrary.worldbank.org/doi/ab...-0-8213-7613-3 >, it is claimed that “In estimating individual-level parameters such as per capita expenditure, we need to transform the household sample weights into individual sample weights” on page 374 and 375.

If I should use individual weights, which is the correct one between the two below for ‘per adult equivalent expenditure per annum’.

individual weights (household weight*adult equivalent)
OR
individual weights (household weight*household size)

Note: the adjustment for household size and composition using simple adult equivalents is
Child≤6 = .5 and all others = 1

With warm regards,

Lwin
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1435
#3

18 Mar 2015, 04:03

Prior issues that you (meaning Steve's correspondent, not Steve!) need to be clear about is: (1) how are your data organised? With one row per household, or one row per individual (within each household). (2) What is the distribution that you are interested in? Among individuals or among households? The most common answer to (2) is the distribution among individuals, with each individual attributed the "equivalised household expenditure (or income) of the household to which the individual belongs". If this is the case, the relevant estimation procedure and choice of weight depends on your answer to (1). If one row per household, then you can get the household distribution by weighting each household obs by its survey weight times the number of persons per household. Further discussion of these sorts of issue are in Biewen and Jenkins, ‘Variance estimation for Generalized Entropy and Atkinson inequality indices: the complex survey data case’ Oxford Bulletin of Economics and Statistics 68 (3), June 2006, 371–383. (And references cited therein.)
For what it's worth, I think you should ditch the Poverty and Inequality Handbook that you cite. It is supplanted by, for example, the World Bank's ADePT project materials (including book). See http://go.worldbank.org/UDTL02A390
Thank you for citing svylorenz, but following Forum FAQ recommendations, you should cite where you found it (on SSC, I presume)

Last edited by Stephen Jenkins; 18 Mar 2015, 04:12.
1 like
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#4

18 Mar 2015, 06:16

Lwin was indeed my correspondent and she sent a Technical Report (1) and I found its Appendix (2). The Questionnaire in the Appendix (Module 5, p.24) indicates that one respondent reported one total for the entire household. That said, I didn't see the code book and I don't know exactly what variable she is analyzing. I can see that I've stepped into subject matter issues that are beyond my expertise, so I'll defer further comment.

1. http://www.mm.undp.org/content/dam/m...Report-Eng.pdf

2. http://www.mm.undp.org/content/dam/m...pendix_Eng.pdf

Last edited by Steve Samuels; 18 Mar 2015, 06:26.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
1 like
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1435
#5

18 Mar 2015, 06:28

Let's be clear: I believe Steve's expertise on survey issues is unparalleled among Forum members. My remarks regarding Lwin's questions were intended as an orthogonal take on Steve's remarks in #1 (which were characteristically full and helpful given the information provided -- and generous too given the off-list approach to him). Put differently, I doubt that the topic is beyond Steve's expertise (and I never intended to suggest that they might be -- apologies if it appeared that way).
1 like
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#6

18 Mar 2015, 15:42

Thank you, Stephen, for the compliment. Expenditures and income surveys, poverty and inequality analyses, are areas in which I have little experience, but I expect I could learn.

Lwin it would help if you provide some clarification about your data set and which variables you have available. . So I ask Stephen's question again: Does your current analysis data set have one line per person or one line per household?

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
1 like
Comment
Lwin Lwin Aung

Join Date: Mar 2015

Posts: 4
#7

19 Mar 2015, 21:34

Dear Steve and Stephen,

Many thanks for your kind suggestions. May I provide more information as follows:
The data to produce Gini coefficient is one row per individual. To be exact, they are the consumption aggregate or total consumption expenditures in adult equivalent per year. Sometime, I might use the consumption aggregate or total consumption expenditures per capita per year.

The distribution that I’m interested in is among individuals.

I’ve quickly checked the ppt slide no. 8 of Stephen and Martin: Selected examples of survey data set-up for estimation of inequality among individuals.
I believe that you use household income ‘net’. (i.e. one row per household)

It seems to me that you are talking about 2 weights in the slide.
Individual sample weight (to use if observation unit is person)

weight = household sample weight ×household size (to use if observation unit is household)

However, I only have sampling weight (which is believed to be household weight) according to the formulae on page 29 and 30 of Technical Report (the link no. 1 provided by Steve). I do not have individual weight.

According to Poverty and Inequality Handbook published by the World Bank on page 374 and 375,

gen weighti = household sampling weight*family size
table region [pweight=weighti], c(mean pcexp)

As pcexp is per capita expenditure, I assumed that the data is organized in one row per individual. So, I thought that to analyse the distribution at individual level, I need individual weights and I should do one of the followings:

individual weights (household weight*adult equivalent)
OR
individual weights (household weight*household size)

Please correct me if I’m wrong. Provided that the data is one row per individual to analyse at individual level, and only household weight is available, may I know how to proceed with my analysis.

With warm regards,

Lwin
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1435
#8

20 Mar 2015, 03:33

Provided that the data is one row per individual to analyse at individual level, and only household weight is available, may I know how to proceed with my analysis.

To me, the answer is not straightforward because I don't know what the "individual weight" would be in your context. [And the WB Handbook you cited is definitely not one of my preferred sources on analysis of living standards; see the other resource I cited -- freely downloadable.) However, assuming that you are proceeding with the conventional approach to "welfare" measurement, which is to assume that each person is attributed with the "living standards" of the household to which s/he belongs (per capita household expenditure in your case), then every person within the same household is attributed with the same value of "pcexp"). I would work with an analysis sample that is derived by selecting one individual per household (new data set has one row per household), and use a weight that is equal to (household weight X number of individuals in the household) in order to derive results for the distribution of "living standards" among the relevant population of individuals). We look at this case in the OBES article cited earlier in this thread. If you want to calculate SEs for the inequality indices cited in our OBES article, not just means, then ssc install svygei_svyatk, to get the programs. Quantile shares and the Gini coefficient, and associated SEs, can be derived using svylorenz on SSC. More generally, see also Jenkins, S.P. 2006. Estimation and interpretation of measures of inequality, poverty, and social welfare using Stata. Presentation at North merican Stata Users' Group Meetings 2006, Boston MA. http://econpapers.repec.org/paper/bocasug06/16.htm. That has, in addition, examples of calculating means, and Foster-Greer-Thorbecke (Econometrica 1984) poverty indices, and associated SEs, both using svy.

PS I don't know what you mean by the reference to "ppt slide no. 8" because I referred you to a published paper. If you're going to cite other material, you should be providing exact sources so that all Forum participants can look for it. Please read the FAQ regarding reference provision.
1 like
Comment
Lwin Lwin Aung

Join Date: Mar 2015

Posts: 4
#9

20 Mar 2015, 05:04

Dear Stephen,

Thanks for your kind explanation. I think that I wrongly interpreted your previous message as follows:

one row per household = total household consumption expenditures per household per year.
one row per individual = total household consumption expenditures per capita per year.

I now understand what you meant for one row per household. That is:
one row per household = total household consumption expenditures per capita per year.

Then, I think that your explanation on weight is same as the discussion of Poverty and Inequality Handbook published by the World Bank.

Actually, I’ve all of your references and I’m using svygei_svyatk too. I’ve attached the ppt file (which is believed to be yours) presented at UK Stata User Group meeting, London, 17–18 May 2005. I referred to slide no. 8 in my previous message.

I’s wondering that it might be OK to just use “household sampling weight” for the analysis sample of total household consumption expenditures per capita per year.

Thanks again for your kind assistance to me.

With warm regards, Lwin
Attached Files

jenkins.pdf (338.6 KB, 1 view)
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#10

20 Mar 2015, 05:33

Lwin, you still seem unsure about whether you have one row per-person or per-adult. The easiest to know is to count the number of observations. The number of HH in the 2009-2010 survey is 18660, according to page 1 of the Technical Report (http://www.mm.undp.org/content/dam/m...Report-Eng.pdf).

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1435
#11

20 Mar 2015, 05:49

Lwin: Until you answer Steve's question in #10, nobody can provide any further useful advice. Your message in #9 appears to confuse the measure of "living standards" (and issues related to the "unit" within which living standards are assumed to be shared) and the way in which the data are organised (the data organisation unit).
Comment
Lwin Lwin Aung

Join Date: Mar 2015

Posts: 4
#12

20 Mar 2015, 06:17

Dear Steve,

Thanks for your concern. Actually, I don’t use total household consumption expenditures per capita per year in my analysis. I only use total consumption expenditures in adult equivalent per year (i.e. total consumption expenditures of a household / adult equivalent). Please see page 49-50 of the technical report how it is derived. However, to be convenient in asking my question, I give an example of per capita household expenditure (i.e. total expenditures of a household/ household size). Yes, the original data set has 18, 660 households.

For my analysis, some statisticians think that it is OK to just use “household sampling weight” for the analysis sample of total household consumption expenditures in adult equivalent per year instead of using (weight = household sampling weight * household size). So, I raise the issue at Statalist.

Thank you so much for your active discussion on the topic.

With warm regards,

Lwin
Comment
Zuhumnan Dapel

Join Date: Sep 2014

Posts: 392
#13

17 Feb 2016, 19:58

Just seeking some clarification.

Based on http://www.oecd.org/eco/growth/OECD-...enceScales.pdf Adult Equivalent Scale can be computed from a dataset that has no information on the number of children, i.e. using square root scale. To divide total household income (or expenditure) by the square root of household size. Does that mean what one gets from this division is the consumption per adult equivalent?

Thanks,
Dapel
Comment

Announcement

Survey: How to estimate per person means from household totals

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment