Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • correct specification of svyset for non-stratified multi-stage survey design

    I have been asked to review a survey of agricultural practices for a large scale intervention in Uganda. The intervention covers 21 districts in Uganda, of which 6 were randomly sampled for inclusion in the survey. In each district, one sub-county was then randomly selected. From each of the sub-counties, 3 Producer Organizations (POs) were selected randomly for inclusion. All of the members of each selected PO were included in the sample.

    A local consultant performed initial analyses in Excel using weighting methods described below:

    Weighting: While the population of the producer organizations differs substantially, the sample size of at least 60 respondents per district was a constant. Calculating a weighted result based on relative population size provides a more precise representation of the percentage of the population covered. Weighted district and programme percentages were calculated for all survey indicators using the formula suggested in the Lot Quality Assurance Sampling (LQAS) training manual and the available PIN PO membership figures for the respective districts.

    I do not think this is the correct way to account for non-response across districts and have been attempting to revise the analysis using appropriate weights in Stata.

    Question 1: Is this the correct specification of svyset?
    svyset District [pweight=sampweights], fpc(fpc1) || PO (fpc2)

    Question 2: Did I calculate the p-weight variable correctly?
    sampweight= (21/6)*(#of sub-counties in discrict/1)*(#of POs in sub-county/3)

    Question 3: What should fpc1 and fpc2 variables be? I am confused about how these should be specified at each stage.

    Question 4: The response rate was only 48%, should I apply postweights to account for differentiation in non-response by district? If so, how would these be specified?

    thank you

  • #2




    I'm assuming that your population is all agricutural operations (farmers? managers? farms? owners?) served by Provider Oragnizations in the 21 district.s

    To address your questions
    1. svyset statement
    .As only one sub-county is selected per district, you are right to omit a sub-county stage. You have also omitted a respondent stage in svyset, which to Stata means that all members responded (fpc = 1). I would modify the statement to:

    Code:
    svyset District [pweight=sampweights], fpc(fpc1) || PO (fpc2) || _n
    Technically this applies to a design in which respondents were randomly selected with replacement. within PO. This, of course, is not true, but it is a way to avoid the assumption that fpc = 1 at the respondent stage. See below for the fpc definitions.

    2. Sampling weight

    Add a factor to weight up the number of respondents in each PO to the number of members in the PO:

    sampweight = (21/6)*(#of sub-counties in district/1)*(#of POs in sub-county/3)*( #members in PO/# responding in PO)


    3. finite population corrections

    fpc1 = 6/21
    fpc2 for a PO = 3/(number of POs in the selected sub-county.
    .
    For any analyses for which intend to quote a p-value, ignore the fpc's. They are intended only for descriptive analyes.

    4 Poststratification

    .With your design you have two problems:
    1) the respondents are not representative the original samples.
    2) the original sample did not represent the population in the 21 districts.

    Groves et al. (2009), section 10.5 summarize the approaches to correcting for non-representive samples and responders:

    1. w1: First stage ratio adjustment to compensate for chance variation in size of primary sampling unit

    2. w2: Compensation for unequal sampling probabilities: multiply probabilities at each stage to get final probability of selection, then invert to get final selection weight. Some surveys plan sampling so that final selection weights are equal (equal probability selection methods, or "epsem")

    3. w3: Adjustment for non-response,
    4. For example weight up responders with probability of response estimated from a logistic regression of the probability of response

    The weight up to that point is w* = w1 x w2 x w3

    4. w4 (final weight): Post-stratify w* to match known population characteristics (sample balancing, raking). This can also partly compensate for a poor design at the expense of increasing standard errors. In addition to the post-stratification options in svyset, there are contributed commands that can post-stratify and rake: Stas Kolenikov's ipfraking (findit); Nick Winter's survwgt rake, part of the survwgt package (SSC), John De'Souza's calibrate,(SSC), and Michael Bergman's ipfweight (SSC).

    References:

    Groves, R. M., Fowler, F. J., Couper, M. P., Lepkowski, J. M., Singer, E., & Tourangeau, R. (2009). Survey methodology, Second Edition (2nd ed.). Hoboken, N.J.: Wiley.

    Raking is explored in the following articles

    1. Kolenikov, S. (2014). Calibrating survey data using iterative proportional fitting (raking). Stata Journal, 14(1), 22-59.
    http://www.stata-journal.com/article...article=st0323


    2. Battaglia, M. P., Hoaglin, D. C., & Frankel, M. R. (2013). Practical considerations in raking survey data. Survey Practice, 2(5).
    http://www.surveypractice.org/index....ml%20%20#fg001 (Unfortunately, the single figure does not display..)

    Steve Samuels
    Statistical Consulting
    [email protected]

    Stata 14.2

    Comment

    Working...
    X