Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Weighting without knowing psu

    Dear Stata users,

    I want to conduct some cross-sectional analysis with data from the South African NIDS (National Income Dynamics Study)/household survey Wave 5.
    I struggle with weighting and would be very grateful about help!

    It is a two-stage sampling with stratification at the district council level. The dataset provides a design weight (correcting for nonresponse) and a post stratified weight (calibrating for sex, age, race).
    My first idea was to process weighting by


    svyset psu [pw= poststatWeight], strata(districtVariable)

    Now, I have two problems. First, the psu variable is not included in the dataset but in a secured dataset which I cannot access (tried to get in touch but does not seem possible). Second, it appears to me that I have to use another command since using a post-strat weight. I considered the following command:

    svyset [pweight=wt], poststrata(groupVariable) postweight(poptots)

    Nevertheless, I am not sure if I do understand the command correctly (also because the command considers just one calibrating variable): Do I just insert my sex, age and race variable as "groupVariable" and how can I deal with the population totals variable since such variables are not provided by the dataset? I studied all manuals and weight descriptions of NIDS (in former waves a psu variable is given) but couldn't figgure out a solution, how I am supposed to weight. In a documentation about cluster correction in the dataset, it says "we should at minimum svyset households as our “cluster” variable" but I assume this does rather refer to cluster as an stata option than as the psu. Would be thankful about any hint!

  • #2
    To the best of my recollection, a poster asked about this survey some years ago and, at my suggestion, queried the survey staff about how he could identify PSUs. (You might do the same to be safe.) The staff response: there is a single weight for all subjects in the same PSU,
    Therefore form a PSU by
    Code:
    egen psu = group(district weight)
    A couple of complications:
    1. The number of PSUs identified this way might be less than published totals. The reason: two or more PSUs in the same district might, by chance, have the same weight. This is not a serious problem. Perhaps you can differentiate PSUs on the basis of another factor, If there is no such factor, then just accept the combined PSUs created by the code above. The result will be slightly inaccurate standard errors.

    2. If there is both a household weight and an individual weight, use the HH weight in the code above.

    About population totals for post-stratification weighting: find these from sources external to the data. In the absence of interim estimates, use the most recent census.
    Last edited by Steve Samuels; 30 Jan 2019, 20:11.
    Steve Samuels
    Statistical Consulting
    [email protected]

    Stata 14.2

    Comment


    • #3
      I have to apologize: I can't find the reference to the post I referred to , but although it was about a South African survey, I don't recall that it was a panel survey (it might have been). By "cluster variable" the documentation you quote surely means PSU. Obviously setting household as the PSU will lead to wrong standard errors, as there are many more HH than PSUs. I do suggest you contact the survey staff for suggestions. One possibility is that they have published design effects with which to multiply standard errors reported by Stata. Weighting when HH can appear in more than one wave can be complicated. If you have choice and the HH weight changes between wave, I I suggest that you analyze one wave a time.
      Steve Samuels
      Statistical Consulting
      [email protected]

      Stata 14.2

      Comment


      • #4
        Dear Steve,

        thank you so much for your effort!
        I guess the post you referred to, is the following one: https://www.stata.com/statalist/arch.../msg01137.html
        Unfortunately, in the current wave they did not solve the problem, there is even no hhcluster variable anymore.
        Nevertherless, you´re definitely right that I need to get in touch with the data support team. I hope they will come back
        to my request sometime (they haven´t yet).
        Last edited by Svenja Fluhrer; 02 Feb 2019, 06:15.

        Comment


        • #5
          Thanks for the link to that old thread, Svenja-I mixed it up with another in which weights identified PSUs. Let us know what. you learn from support.
          Steve Samuels
          Statistical Consulting
          [email protected]

          Stata 14.2

          Comment

          Working...
          X