Hi,
I have a somewhat similar issue as that discussed in the post “Specification of svyset for melogit model of complex survey data”. The post was possibly left half-way so I have started a new thread. The link to the post is:
https://www.statalist.org/forums/for...ex-survey-data
I am using DHS data and Stata 13.1. As mentioned in the post, DHS is a nationally-representative household survey data with a two-stage/three-stage clustering design where Census enumeration units were sampled at first stage and households at second stage. However, only one weight variable is specified (wij). The sampling design is explained in the following excerpts from the report
(EAs means Census enumeration areas; ward size indicates number of households)
I plan to pool multiple DHS data (from different years) over time for the same country to improve my sample size for a multilevel model. As my first step, I will like to svyset the data for a single survey (example data attached). The data has total of 2380 observations clustered in 71 districts with an average of 23.1 observations per district (Range: 1-109).
As suggested in other threads and by Rabe-Hesketh & Skrondal in their 2006 paper, I rescaled the weights to purge the individual level weights of the group probabilities and assumed equal sampling probability for groups.
But then I get muddled on how to use all these variables to svyset the data.
I have tried the following commands but none of them works. Also, the strata level variable doesn't go anywhere. I guess the weight option is in Stata 14 or higher.
An example data is pasted below
I will appreciate if someone can point me in the right direction.
Many thanks
I have a somewhat similar issue as that discussed in the post “Specification of svyset for melogit model of complex survey data”. The post was possibly left half-way so I have started a new thread. The link to the post is:
https://www.statalist.org/forums/for...ex-survey-data
I am using DHS data and Stata 13.1. As mentioned in the post, DHS is a nationally-representative household survey data with a two-stage/three-stage clustering design where Census enumeration units were sampled at first stage and households at second stage. However, only one weight variable is specified (wij). The sampling design is explained in the following excerpts from the report
Each province was stratified into urban and rural areas, yielding 14 sampling strata. Samples of wards were selected independently in each stratum…...
In the first stage, 383 wards were selected with probability proportional to ward size and with independent selection in each sampling stratum. ….
Due to the large size of the urban wards, in a second stage of sample selection, one EA was randomly selected from each of the sample urban wards….
In the last stage of selection, a fixed number of 30 households per cluster were selected with an equal probability systematic selection from the newly created household listing.
In the first stage, 383 wards were selected with probability proportional to ward size and with independent selection in each sampling stratum. ….
Due to the large size of the urban wards, in a second stage of sample selection, one EA was randomly selected from each of the sample urban wards….
In the last stage of selection, a fixed number of 30 households per cluster were selected with an equal probability systematic selection from the newly created household listing.
I plan to pool multiple DHS data (from different years) over time for the same country to improve my sample size for a multilevel model. As my first step, I will like to svyset the data for a single survey (example data attached). The data has total of 2380 observations clustered in 71 districts with an average of 23.1 observations per district (Range: 1-109).
As suggested in other threads and by Rabe-Hesketh & Skrondal in their 2006 paper, I rescaled the weights to purge the individual level weights of the group probabilities and assumed equal sampling probability for groups.
Rabe-Hesketh and Skrondal. Multilevel modeling of complex survey data. J. R. Statist. Soc. A (2006) 169, Part 4, pp. 805827
Code:
* Group level weight gen x=1 * Individual level weight: w i|j from wij gen wt=v005/1000000 *Level 1 weights using scaling method 1: New weights sum to cluster sample size gen sqw = wt*wt egen sumsqw = sum(sqw), by(sdist) egen sumw = sum(wt), by(sdist) gen pwt1 = wt*sumw/sumsqw *Level 1 weights using scaling method 2: New weights sum to effective cluster size egen nj = count(lbw), by(sdist) gen pwt2 = wt*nj/sumw
Code:
. svyset v001 [pweight=wt], strata(v022) ||_n, fpc(pwt1) singleunit(missing) Note: Stage 1 is sampled with replacement; all further stages will be ignored pweight: wt VCE: linearized Single unit: missing Strata 1: v022 SU 1: v001 FPC 1: <zero> . svy:melogit lbw || sdist:, cov(unstruct) melogit is not supported by svy with vce(linearized); see help svy estimation for a list of Stata estimation commands that are supported by svy r(322); . svyset, clear . svyset v001 , strata(v022) ||_n, weight (pwt1) singleunit(missing) Note: Stage 1 is sampled with replacement; all further stages will be ignored option weight() not allowed r(198);
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input int(v001 v002) byte v003 long v005 int v021 byte(v022 v024 v025 sdist) float(lbw age matgr) 1 17 4 1084429 1 1 1 1 3 0 51 0 1 49 2 1084429 1 1 1 1 3 0 57 0 1 57 2 1084429 1 1 1 1 3 . 47 1 1 81 4 1084429 1 1 1 1 3 . 34 0 1 97 3 1084429 1 1 1 1 3 0 30 0 2 4 2 274377 2 1 1 1 4 0 5 0 2 36 3 274377 2 1 1 1 4 . 10 1 2 62 1 274377 2 1 1 1 4 0 33 0 2 166 2 274377 2 1 1 1 4 . 31 0 3 45 4 566707 3 1 1 1 4 0 26 0 3 87 3 566707 3 1 1 1 4 0 37 1 3 94 3 566707 3 1 1 1 4 0 0 0 3 150 2 566707 3 1 1 1 4 0 51 0 3 164 2 566707 3 1 1 1 4 0 23 1 3 171 2 566707 3 1 1 1 4 0 37 0 3 171 6 566707 3 1 1 1 4 0 23 0 3 206 5 566707 3 1 1 1 4 0 48 1 4 111 1 979347 4 1 1 1 4 0 51 0 4 118 7 979347 4 1 1 1 4 0 7 0 4 125 4 979347 4 1 1 1 4 1 9 0 4 132 2 979347 4 1 1 1 4 0 45 1 4 146 3 979347 4 1 1 1 4 0 51 1 4 195 4 979347 4 1 1 1 4 1 8 0 4 195 9 979347 4 1 1 1 4 1 21 0 5 84 2 864348 5 1 1 1 4 . 17 1 5 109 6 864348 5 1 1 1 4 0 47 1 5 235 4 864348 5 1 1 1 4 1 2 1 5 322 1 864348 5 1 1 1 4 . 58 0 5 360 2 864348 5 1 1 1 4 0 18 0 6 36 2 969976 6 1 1 1 4 0 18 1 6 45 1 969976 6 1 1 1 4 . 40 0 6 120 2 969976 6 1 1 1 4 0 41 1 6 167 1 969976 6 1 1 1 4 0 49 0 6 176 3 969976 6 1 1 1 4 . 34 0 6 223 4 969976 6 1 1 1 4 0 32 0 6 232 2 969976 6 1 1 1 4 0 13 1 6 251 2 969976 6 1 1 1 4 . 4 0 6 260 5 969976 6 1 1 1 4 0 35 0 6 279 2 969976 6 1 1 1 4 0 51 0 7 216 3 981948 7 1 1 1 4 0 6 0 8 7 2 1020485 8 1 1 1 4 1 52 1 8 85 1 1020485 8 1 1 1 4 0 40 0 9 64 8 1026315 9 1 1 1 5 0 3 0 9 83 4 1026315 9 1 1 1 5 0 32 0 10 63 4 874170 10 1 1 1 5 0 10 0 10 154 2 874170 10 1 1 1 5 0 2 0 11 128 2 1023708 11 1 1 1 5 0 58 0 11 158 4 1023708 11 1 1 1 5 . 0 1 11 166 2 1023708 11 1 1 1 5 0 50 0 11 219 2 1023708 11 1 1 1 5 . 20 0 end label values v022 V022 label def V022 1 "province 1 - urban", modify label values v024 V024 label def V024 1 "province 1", modify label values v025 V025 label def V025 1 "urban", modify label values sdist SDIST label def SDIST 3 "ilam", modify label def SDIST 4 "jhapa", modify label def SDIST 5 "morang", modify label values lbw A label def A 0 "No", modify label def A 1 "Yes", modify label values matgr matgr label def matgr 0 "20-24", modify label def matgr 1 "<20", modify
Many thanks
Comment