Request for support

Habitamu Asifawu

Join Date: Sep 2019

Posts: 14
#1

Request for support

10 Aug 2022, 02:57

Dear Sir/Madam,

Trust you are doing well. I am working on household as well as individual level analysis based on household survey (household consumption expenditure survey) data of Ethiopia. While working on, I faced problem in using survey weight. The case is due to unable to clearly understand the multi-stage survey structure.

Note,

The survey design: the survey covered all rural and urban areas of the country. A stratified random sampling technique was employed to draw representative sample. The country was first stratified into nine regional states and two city administrations. Then each regional state was further stratified into three broad categories namely, rural, major urban centers and other urban area categories. However, Harari regional state and Dire-Dawa City Administration were stratified into rural and urban categories, while Addis-Ababa has only urban category, but stratified by Sub-City. Therefore, each category of a specific region, in most cases, was considered to be a survey domain or reporting level for which the major findings of the survey are reported.

In the first two categories, namely the rural and major urban, a two-stage stratified sampling technique was implemented whereby the Enumeration Areas (EAs) were considered as a Primary Sampling Unit (PSU) and the households were considered as the Secondary Sampling Unit (SSU). The EAs were selected using the Probability Proportional to Size (PPS); size being the number of households obtained from the 2007 Population and Housing Census, while the sample households were systematically selected from a fresh list of households within the EA made during the survey period.

On the other hand, for the other urban category, a three stage stratified sampling technique was carried out. In this case, the urban centers, EAs and households were used as a PSU, SSU and the Tertiary Sampling Unit (TSU), respectively. Here, the PSUs and SSUs were selected using the PPS, while the selection of households follow the same approach as described earlier.

Please may you help me in assigning appropriate weight for household as well as individual level analysis? How to code in STATA for household analysis and how to code for individual analysis?.

Thank You in Advance
Tags: None
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#2

10 Aug 2022, 06:55

Please watch this before asking questions https://youtu.be/bXfaRCAOPbI
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#3

10 Aug 2022, 08:44

I am assuming you've read the manual on the svy: set of commands. If you haven't, you need to do so first.

In the US, someone will typically publish Stata statements on how to svyset the data. For example, here is a third party's info page on how to do this to the US National Health Interview Survey. The NHIS originates with our Centers for Disease Control and Prevention, and the CDC has this info as well, e.g. here.

You could try to see if Ethiopia's statistical agency (or the agency that oversees this survey) published anything. If they published info for, say, SAS but not Stata, you can attempt to infer the appropriate variables. For example, for the NHIS, you need to specify the primary sampling unit, the analysis weight, and the stratum variable. (NB: that's common for many US surveys, but some may use other types of information!). In SAS, the code is given as:

Code:

proc surveymeans; strata pstrat; cluster ppsu; weight wtfa; var <blah blah blah>; run;

In this case, even if you don't know SAS, I hope it's obvious what the cluster, strata, and weight variables are.

I'm vaguely aware that the World Bank may host this survey as well, possibly with data extraction tools and maybe some variable formatting. They could also have written the proper svyset statements. You could look there also. Or similarly for any large international organization that hosts the survey.

Aside from that, it is very difficult to answer questions without seeing the survey data. Jared is not trying to harp on you by telling you to watch a vid about how to use the dataex command. It's just genuinely not possible to make general statements about how to fix the problem that would be useful to you, or even correct.

Last edited by Weiwen Ng; 10 Aug 2022, 08:47.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
1 like
Comment

Announcement

Request for support

Comment

Comment