I have been asked to review a survey of agricultural practices for a large scale intervention in Uganda. The intervention covers 21 districts in Uganda, of which 6 were randomly sampled for inclusion in the survey. In each district, one sub-county was then randomly selected. From each of the sub-counties, 3 Producer Organizations (POs) were selected randomly for inclusion. All of the members of each selected PO were included in the sample.
A local consultant performed initial analyses in Excel using weighting methods described below:
Weighting: While the population of the producer organizations differs substantially, the sample size of at least 60 respondents per district was a constant. Calculating a weighted result based on relative population size provides a more precise representation of the percentage of the population covered. Weighted district and programme percentages were calculated for all survey indicators using the formula suggested in the Lot Quality Assurance Sampling (LQAS) training manual and the available PIN PO membership figures for the respective districts.
I do not think this is the correct way to account for non-response across districts and have been attempting to revise the analysis using appropriate weights in Stata.
Question 1: Is this the correct specification of svyset?
svyset District [pweight=sampweights], fpc(fpc1) || PO (fpc2)
Question 2: Did I calculate the p-weight variable correctly?
sampweight= (21/6)*(#of sub-counties in discrict/1)*(#of POs in sub-county/3)
Question 3: What should fpc1 and fpc2 variables be? I am confused about how these should be specified at each stage.
Question 4: The response rate was only 48%, should I apply postweights to account for differentiation in non-response by district? If so, how would these be specified?
thank you
A local consultant performed initial analyses in Excel using weighting methods described below:
Weighting: While the population of the producer organizations differs substantially, the sample size of at least 60 respondents per district was a constant. Calculating a weighted result based on relative population size provides a more precise representation of the percentage of the population covered. Weighted district and programme percentages were calculated for all survey indicators using the formula suggested in the Lot Quality Assurance Sampling (LQAS) training manual and the available PIN PO membership figures for the respective districts.
I do not think this is the correct way to account for non-response across districts and have been attempting to revise the analysis using appropriate weights in Stata.
Question 1: Is this the correct specification of svyset?
svyset District [pweight=sampweights], fpc(fpc1) || PO (fpc2)
Question 2: Did I calculate the p-weight variable correctly?
sampweight= (21/6)*(#of sub-counties in discrict/1)*(#of POs in sub-county/3)
Question 3: What should fpc1 and fpc2 variables be? I am confused about how these should be specified at each stage.
Question 4: The response rate was only 48%, should I apply postweights to account for differentiation in non-response by district? If so, how would these be specified?
thank you
Comment