Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using svyset command for post stratification

    Hi everyone,

    I am relatively new at analyzing survey data and not sure if I have the right code for post stratification for my dataset? Thank you for any advice!

    The data uses random sampling and

    Post-stratification To reduce the overall standard errors, and weight the population totals up to the known population figures, a post-stratification correction is applied. Based on the projected number of households in Kampala and in the urban and rural segments of the four main regions (central, eastern, northern and western), adjustment factors are calculated. For cross-sectional estimates of population dynamics based only on the UNPS 2009/10 data, the data users must use the variable wgt09 as part of GSEC1.dta of the UNPS 2009/10 package. This variable includes sampling weights for original as well as split-off households and is generated as a result of the procedures detailed above. As noted above, the UNPS strata of representativeness include (i) Kampala City, (ii) Other Urban Areas, (iii) Central Rural, (iv) Eastern Rural, (v) Western Rural, and (vi) Northern Rural. The variable stratum as part of the GSEC1.dta of the UNPS 2009/10 data package captures these.


    Code:

    svyset, poststrata(stratum) postweight(wgt09)



  • #2
    What kind of random sampling was used? Simple random sampling within strata would require a list of individuals or households. This is rarely possible in non-European surveys, so I would guess there was multistage sampling, with a sample of areas in the first stage within strata.. These first stage areas are the primary sampling units (PSUs). You need to include a PSU variable in the svyset statement

    I'm a little confused by the description
    This variable includes sampling weights for original as well as split-off households and is generated as a result of the procedures detailed above
    as there is a difference between asampling weight and post-stratification weight. I doubt that your svyset is correct. To be a poststratum weight, wgt09 would have to have just four unique values, with the value for an individual the known population total of the individual's stratum.

    So that we can figure this out, please show us the results of:
    Code:
    codebook wgt09
    In future posts, be sure to include all code and results between code delimiters [CODE] and [/CODE], as requested in FAQ12. All dataex or other data listings should also be begin and end with those delimiters.

    Last edited by Steve Samuels; 11 Aug 2018, 08:21.
    Steve Samuels
    Statistical Consulting
    [email protected]

    Stata 14.2

    Comment


    • #3
      Hi Steve,

      Thank you for your reply and suggestions! I think the sampling was done by dividing into five strata (5 different regions), within each stratum, EAs (enumerated areas) were selected using simple random sampling, but the probabilities of selection varied between strata. Does this answer your question? I apologize in advance for any vague answers since I am not experienced with survey data.


      The results of the codebook is below:




      Code:
      codebook wgt09
      --------------------------------------------------------------------------------------------------------------------
      wgt09 UNPS 2009/10 Panel Weights w/ Split-Off HHs
      --------------------------------------------------------------------------------------------------------------------

      type: numeric (float)

      range: [191.28273,8077.8682] units: .00001
      unique values: 1,134 missing .: 1,590/3,627

      mean: 1614.43
      std. dev: 1157.62

      percentiles: 10% 25% 50% 75% 90%
      620.209 879.973 1207.7 2040.45 3089.25


      I have also copy and pasted what I was given as part of the survey. I am not interested in conducting household analyses between 2005/06 and 2009/10, do I still need to include the variable wgt09wosplits in my code? Thank you for any advice!

      "Post stratification

      To reduce the overall standard errors, and weight the population totals up to the known population figures, a post-stratification correction is applied. Based on the projected number of households in Kampala and in the urban and rural segments of the four main regions (central, eastern, northern and western), adjustment factors are calculated. This correction also reduces overall standard errors (see Little et al, 1997).

      The final weight calculations are the product of the (i) base weight, (ii) the inverse probability of selection into the UNPS, (iii) the inverse probability of selection into tracking, pooled and averaged for original households that split, (vi) a fair-share correction for new members, (v) the attrition correction, and (vi) the post-stratification adjustment. Prior to step (vi), weights are trimmed at the two percent level.

      For cross-sectional estimates of population dynamics based only on the UNPS 2009/10 data, the data users must use the variable wgt09 as part of GSEC1.dta of the UNPS 2009/10 package. This variable includes sampling weights for original as well as split-off households and is generated as a result of the procedures detailed above. As noted above, the UNPS strata of representativeness include (i) Kampala City, (ii) Other Urban Areas, (iii) Central Rural, (iv) Eastern Rural, (v) Western Rural, and (vi) Northern Rural. The variable stratum as part of the GSEC1.dta of the UNPS 2009/10 data package captures these.

      The variable wgt09wosplits as part of GSEC1.dta of the UNPS 2009/10 package was computed only for the original households that were interviewed both in 2005/06 and 2009/10. wgt09wosplits is the multiplication of UNHS 2005/06 sampling weight and the inverse of the proportion of the original households that were captured in a given EA. wgt09wosplits is provided for data users interested in conducting household-level analyses of changes between 2005/06 and 2009/10."
      Last edited by Mangji Zo; 11 Aug 2018, 09:28.

      Comment


      • #4
        Thanks for the information. The results from codebook would have been easier to read if you had put them inside the code delimiters as well. However there are more than five unique values, so they are not the stratum totals. However you do have a problem: 1,134 observations have missing weights. These will be excluded from the analysis. Also, you mention five regions, but the documentation lists six levels for stratum. Ignoring these issues, I recommend:
        Code:
         svyset EA [pw = wgt09], strata(stratum)
        If you can substitute the original regional variable for stratum,so much the better. Do investigate the missing weights.
        Last edited by Steve Samuels; 11 Aug 2018, 09:57.
        Steve Samuels
        Statistical Consulting
        [email protected]

        Stata 14.2

        Comment


        • #5
          Hi Steve,

          Thank you for your help and recommendations! I now have a better understanding of how svyset command works. I will look into the missing weights issue. For next time, I will remember to put the code delimiters in.

          Comment

          Working...
          X