How to write correct svyset?

Kim Veloso

Join Date: Jun 2018

Posts: 19
#1

How to write correct svyset?

22 Jun 2018, 10:36

Dear all,

I am using Labor Force Survey data in a logit analysis, and therefore need help with svyset before I run my regression.
However, I am having trouble doing this as I am not very familiar with svyset. (I am only analyzing a sub-population, only individuals who are employed.)

I am supposed to follow this structure according to: https://www.stata.com/support/faqs/s...stage-designs/

Code:

svyset su1 [pw=pwt], strata(strata1) fpc(fpc1) /// || su2, fpc(fpc2) || _n, fpc(fpc3)

Is this correct? Should the urban rural variable be included somewhere?

Code:

svyset psu [pweight = weight], strata(prov)

Any pointers are greatly appreciated!

Thank you very much!

- - - -
The sampling design is as follows according to: http://catalog.ihsn.org/index.php/ca...tab=study-desc

The sample size of the survey is 50,640 households per quarter, equivalent to 16,880 households per month. Sample size was designed to ensure the statistical significance of data for region by quarter and for province by year. Households were randomly selected from the 15% sample enumeration areas of the Population and Housing Census 2009 following a two-stage procedure:
1. Selecting enumeration areas
2. Selecting households. All residents ages 15 and above were interviewed and enumerated.

Sample Frame: The sample of the 2013 Labour force survey is the two-stage stratified sample, presented for the whole country, urban/rural areas; 6 socio-economic regions, Hanoi and Ho Chi Minh City for quarterly and all centrally governed cities/provinces for yearly. Each centrally governed province, city constitutes a main stratum with two sub-stratums of urban areas and rural areas. The sample frame is the 15% sample enumeration areas of the 2009 Population and Housing Census.

Sample design: The survey followed a two-stage stratified sampling procedure designed as follows:
- Stage 1 (selecting enumeration areas): Each centrally governed city/province constitutes a main stratum, after that, each main stratum was divided into 2 sub-stratums within each representing "urban" and "rural" areas. Then, the list of enumeration areas of cities/provinces (the master sampling frame was taken from the sampling frame 15% of the Population and Housing Census 2009) was divided into 2 independent samples (urban and rural) and enumeration areas were chosen by the Kish method.

- Stage 2 (selecting households): for each enumeration area defined in stage 1, 15 enumeration households (55 provinces) or 20 enumeration households (8 provinces: ) were systematically chosen.

Last edited by Kim Veloso; 22 Jun 2018, 11:17.
Tags: None
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#2

01 Jul 2018, 16:21

You haven't given us much to go on. Name the important design variables mentioned in the document: main strata; the urban/rural variable (if there is one); enumeration areas; and hh id ; respondent id; the design weight.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment

Sunga Kalemba

Join Date: Jun 2014
Posts: 22

03 Jul 2018, 01:48

Steve,

Allow me to join this thread as this is relevant to the ongoing discussion.

I have a confidentialised census sample which is 1% of the population from the Australian Bureau of Statistics the data methodology is given here

HTML Code:

http://www.abs.gov.au/ausstats/[email protected]/Latestproducts/2037.0.30.001Main%20Features202011?opendocument&tabname=Summary&prodno=2037.0.30.001&issue=2011&num=&view=

The data is in 3 levels i,e Individual, family and dwelling with these respective IDs (ABSPID, ABSFID and ABSHID). It covers the whole country but areas are divided into states (STATE) and hh id is ABSFID ; respondent id is ABSPID; dwelling id is ABSHID, BUT the design weight is not given. a sample of some key variables including sex and age are as follows:-

Code:


	Code:
	* Example generated by -dataex-. To install: ssc install dataex
clear
input str14 ABSHID byte(ABSPID ABSFID Sex) float agegroup long STATE
"CSF11B00000001" 0 1 2 6 1
"CSF11B00000001" 0 1 1 3 1
"CSF11B00000002" 0 1 2 4 1
"CSF11B00000002" 0 1 1 3 1
"CSF11B00000003" 0 1 1 1 1
"CSF11B00000003" 0 1 2 5 1
"CSF11B00000003" 0 1 1 1 1
"CSF11B00000004" 0 1 1 5 1
"CSF11B00000005" 0 1 2 9 1
"CSF11B00000006" 0 1 1 4 1
end
label values Sex SEXP
label def SEXP 1 "1. Male", modify
label def SEXP 2 "2. Female", modify
label values agegroup agegrouplbl
label def agegrouplbl 1 "under 16", modify
label def agegrouplbl 3 "20-29", modify
label def agegrouplbl 4 "30-39", modify
label def agegrouplbl 5 "40-49", modify
label def agegrouplbl 6 "50-59", modify
label def agegrouplbl 9 "85+", modify
label values STATE STATE
label def STATE 1 "NSW", modify

My trouble comes on how to calculate and apply the weight and eventually survey set my data. Given that we can calculate weight as (sample size/population size), i am wondering if doing that wont simply give me a single number for the weights for each level (individual, family and household). The methodology file accompanying the data says the ideal PSU is dwelling, but i want to use the individual as unit of analysis, it has gone into details on what to do to avoid specifications which i completely understand

Would you please advice on how i would go about survey setting this dataset using the calculated weights and individual as unit if analysis?

Best regards.

Sunganani Kalemba
PhD Student.
Queensland

Comment

Steve Samuels

Join Date: Mar 2014

Posts: 1786
#4

03 Jul 2018, 15:12

Please ask this question in a new topic, sunga. Kim's question concerns details of the Vietnam survey and I hope pursue those further.. I'd like to remind you of the strong preference in Statalist for using full real names.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
1 like
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#5

03 Jul 2018, 16:58

I found the following data description: http://catalog.ihsn.org/index.php/ca...36/datafile/F1

However it does not include the variable that is the sampling weight, so it's up to you to find that in the data. It's not necessary to include a sampling stage for households, as in the absence of finite population corrections, only between-PSU variation counts. Since all eligible respondents in a HH are studied, they receive the HH sampling weight.
• tinh
Province/City (Stratum)

• diaban
Enumeration Area Number (PSU)

• hoso
Household Number

• stt
ID number

• ttnt
Urban/Rural (Substratum)

The following code will work for individual and HH responses:

Code:

egen xstratum = group(tinh ttnt) //This creates a separate category for urban and rural areas in each stratum. svyset diaban [pw = household weight], strata(xstratum)

Last edited by Steve Samuels; 03 Jul 2018, 17:04.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Sunga Kalemba

Join Date: Jun 2014

Posts: 22
#6

03 Jul 2018, 21:02

My apologies steven,

am still getting my head around posting and updating my profile, excuse me for that. But i have proceeded as advised.

Sunganani Kalemba
PhD Student.
Queensland
Comment
Kim Veloso

Join Date: Jun 2018

Posts: 19
#7

04 Jul 2018, 17:50

Thank you very much for your help, Mr. Samuels! I highly appreciate it!
Comment
Hassen Ali

Join Date: May 2018

Posts: 39
#8

05 Jul 2018, 02:49

Dear all, Thank you very much!! I have learned a lot from your daily posts.
Respectfully, Hassen
Comment

Announcement