Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Declaring surveyset options in ESS7 data file

    Hi all!

    I am writing my master thesis and using the ESS round 7 data file and would like to use the svyset command. I have read the stata manual and several tutorials, yet I am still not sure how to correctly specify the options of svyset for my dataset. The ESS data weighting guideline says I should combine dweight with pweight if I am conducting the analysis on a group of countries. The variables dweight (designweight), pweight (population weight) and poststratification weight are already in the data set. Other relevant variables are: idno (respondent id), cntry (country) and region.

    Now... should I generate a new variable combining dweight and pweight and in this case should I write: gen newweight=dweight*pweight?

    Does this syntax make sense to you?
    svyset idno [pw=newweight], strata(cntry) singleunit (centered)

    Any answer is much appreciated!

    Best,
    Laura

  • #2
    Did you find out how to do it in the end? I have a similar problem in that I would like to use subpop for my multi-level model and thus have to svyset my data (instead of just providing the weights in the multi-level regression command), but am still confused as to how I am to specify the options in the svyset command for ESS?! (I found the SDDF file, but am not sure how to use it ...)

    Comment


    • #3
      A link to the weighting guidelines would help. So would seeing the mixed code that you would use absent the subpop option, pasted between CODE delimiters, as you've done before. I'll have no chance to see your response until next week.
      Last edited by Steve Samuels; 15 Jun 2018, 10:36.
      Steve Samuels
      Statistical Consulting
      [email protected]

      Stata 14.2

      Comment


      • #4
        Hello Steve, thanks for your reply. Here are the weighting guidelines for the ESS: http://www.europeansocialsurvey.org/...ing_data_1.pdf
        There is also the so called SDDF file which gives some more specifics on the sample design. (Hope, the link works, it's from my workplace's cloud.) Then, the ESS also writes on its own sampling guidelines.

        As Richard Williams noted here, it is a rather complicated explanation. The respective discussion thread was also quite interessing for me. Richard followed it up by another one here, which was motivated by the question "Suppose your data set has pweights but no stratification or clustering. Is there any compelling reason to svyset the data and use svy:, as opposed to just adding [pw = whatever] to each estimation command?"

        However, it seems to me, the ESS uses stratification and clustering. Besides, as I want to use the subpop option, I would like to svyset my data. Maybe it is possible to just use the [pw = whatever] option on a restricted sample and still calculate the standard errors etc on the basis of the whole sample, thus circumventing the svysetting, but unfortunately I'm a little at a loss as to how to go about that.

        Here is an example of the mixed code that I would use absent the subpop option:

        Code:
        melogit vot ib4.work10 gndr age age2 eduy ib1.occupad ib0.inca partner chldhm ib1.health blgetmg ///
        ib4.polintr ib3.hlthhmp clsprty jbspv GDP PERC AWH i.WAHL SUND [pw = weighty]|| country:,
        It's a multi-level model (random intercept fixed slope) with individuals at L1 and countries at L2.
        gndr, age, age2, eduy, occupad, inca, partner, chldhm, health, blgetmg, polintr, hlthhmp, clsprty, and jbspv are L1 controls
        GDP, PERC, AWH, WAHL, and SUND are L2 controls.
        weighty is the product of dweight (design weight) and pweight (population size weights): weighty = dweight*pweight

        I would like to exclude from the sample people who are retired and don't work. (As working hours is the independent variable and the research question looks at the impact of working hours on voting behaviour.) Thus, my first impulse was to run the code above with an if-clause:

        Code:
        melogit vot ib4.work10 gndr age age2 eduy ib1.occupad ib0.inca partner chldhm ib1.health blgetmg ///
        ib4.polintr ib3.hlthhmp clsprty jbspv GDP PERC AWH i.WAHL SUND if !(rtrd==1 & pdwrk==0) [pw = weighty]///
        || country:,
        But, of course, then I have the problem that standard errors are not calculated based on the whole sample, but only based on the restricted sample ...

        PS: I am using Stata 14 MP atm.
        Last edited by Jakob Schaefer; 21 Jun 2018, 06:01.

        Comment


        • #5
          Hello,

          I am also having trouble with svysetting with a sub-population using labor force survey data.

          Code:
          svy, subpop(if empstat!=0): logit Y i.age_grp i.sex i.education i.sec3 i.urbrur i.marital
          My first instinct was to restrict the sample and just run a logit regression with pweights but apparently that would be incorrect...

          Hoping to get more help on this issue as well.

          Comment


          • #6
            @Kim Jakob's problem is not the same as yours because multilevel commands have special requirements.

            For you, svyset followed by your svy: logit statement is the right approach. logit with only the pweight options would yield entirely wrong standard errors because it ignores the strata and clustering.

            To prevent mixing up your concerns and Jakob's, please start a new thread if you have further questions.
            Steve Samuels
            Statistical Consulting
            [email protected]

            Stata 14.2

            Comment


            • #7
              Jakob,

              Before worrying about the subpop() option, I have concerns about other issues:

              The SDDF file refers to design elements "psu" "domain" and "stratify" (i.e. strata). To svyset, all these elements are needed. However, the publicly available round 7 file does not include any of these elements.

              Question 1: Do you have these variables in your data set? If not, you won't be able to svyset the data.

              If you do have them, then for the multilevel model, you would also need to calculate the design selection probability of each PSU. It is often equal to the the (population of PSU)/(population of stratum containing the PSU)

              Question 2: Can you get that?

              If not, then I don't think that you can use svyset. The reason is that after svyset, Stata expects you to include the PSU as a level in the multilevel model and to specify the PSU design weight (1/(selection probability of the PSU).

              Last edited by Steve Samuels; 23 Jun 2018, 20:40.
              Steve Samuels
              Statistical Consulting
              [email protected]

              Stata 14.2

              Comment


              • #8
                Thanks, Steve, that is very helpful!

                Sorry, I should have noted that earlier: Yes, I have the elements "psu", "domain", and "stratify". (If anybody wonders, they are provided in a separate SDDF file on the ESS website.)

                So, next step would calculating the design selection probability of each PSU, if I am not mistaken?! By saying that it is OFTEN equal to (population of PSU)/(population of stratum containing the PSU), are you referring to the possibility that some PSUs have no stratum? (Which is the case for the ESS.) How am I to deal with these?

                Thanks again for your very valuable help!

                PS Am I right in assuming that the design weights, the ESS provides, are different from the design selection probability of each PSU? (The ESS design weights, I would assume, are on a level below, as it were, concerning individuals.) On the design weights, the ESS provides the following information:

                In several of the sample designs used by countries participating in the ESS not all individuals in the population aged 15+ have precisely the same chance of selection. Several countries use complex sampling designs where some groups or regions of the population have higher probabilities of selection. [...] The design weights are computed as the inverse of the inclusion probabilities and then scaled such that their sum equals the net sample size. For further information on design wegihts please see also the ESS Documentation Report.
                (Source: http://www.europeansocialsurvey.org/...ing_data_1.pdf)

                And on the variable PROB it says:

                The probability for the respondent of being selected into the gross sample, also called inclusion probability. It is the basis for the design weights of the ESS.
                (Source: https://cloud.wzb.eu/index.php/s/4Of1ZCQu556Y6ku)
                Last edited by Jakob Schaefer; 24 Jun 2018, 05:12.

                Comment


                • #9
                  I apologize for the delay in responding

                  Apparently one cannot have country as the top level of a multilevel model and yet have variation based on the primary sampling units within countries. Here are some choices; all have disadvantages and I haven't survey data to try them.

                  1.Run a straight non-svy ME model with levels
                  Code:
                   
                  melogit <your model> || country: || psu:
                  Add informative design elements as predictors at the correct level. So, for example, urban/rural, if available, should be at the PSU level. Weights can be informative-for example if one person is chosen from among all eligible at a HH, then the weight would be higher. So, include HH size as a predictor.
                  Without weights for PSUs, variance components will be biased.

                  2. Try a "svy" analysis in which "country" is the primary sampling unit, with "psu" a secondary unit. Below I give country and PSU the nominal weight of 1. The results should be similar to 1, except that you can get the individual weights into the model. Again, without weights for PSUs, variance components risk bias.
                  Code:
                  egen superstrat = group( domain stratify), by(country)
                  gen one = 1
                  svyset country weight(one) || psu, weight(one), strata(superstrat) || _n ,weight(PSPWGHT)
                  svy, subpop(<your condition>): meglm <your model>  || psu:  || country:
                  3. Make country a stratum , include an indicator for each country with no constant, then plot histograms of the country effects (log odds or probabilities)

                  Code:
                  egen newstrat = group(country domain stratify)
                  svyset  psu [pw = PSPWGHT], strata(newstrat) || _n
                  svy, subpop(<your condition>): logit <your model>
                  Here's how to get to summaries of the country effect (here the "rep78" effects)
                  Code:
                  sysuse auto, clear
                  tab rep78, gen(xrep)
                  regress mpg gear xrep?, nocons coeflegend
                  
                  matrix b = e(b)
                  matrix rep = b[1,2...]'
                  matrix list rep
                  svmat rep, names(reff)
                  sum reff
                  kdensity reff
                  Steve Samuels
                  Statistical Consulting
                  [email protected]

                  Stata 14.2

                  Comment

                  Working...
                  X