Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Svyset

    Hello, I am rather unfamiliar with Statas survey commands and am just starting out, so this question is rather elementary. I have used a simple random sample and have sample 748 schools from a population of 3743 schools. The response rate in my survey is about 44 percent, so I have 332 observations in my dataset. I began with svysetting my dataset with the following command:

    gen pop_size=3743
    gen weight_srs= pop_size/748
    gen fpc= 748/pop_size
    svyset skolenhetskod [pweight=weight_srs], fpc(fpc)

    (skolenhetskod is my id-variable). When I then later on run svy: proportions or svy: tab it says that my population size is 1656.333 For instance I get the following output:

    . svy: proportion matematik1617
    (running proportion on estimation sample)

    Survey: Proportion estimation

    Number of strata = 1 Number of obs = 331
    Number of PSUs = 331 Population size = 1656.33
    Design df = 330

    ---------------------------------------------------------------
    | Linearized
    | Proportion Std. Err. [95% Conf. Interval]
    --------------+------------------------------------------------
    matematik1617 |
    0 | .0392749 .0095651 .0242266 .0630663
    1 | .9607251 .0095651 .9369337 .9757734
    ---------------------------------------------------------------

    My question is shouldn't my population size be 3743? Have I specified the survey weight incorrectly?
    Thanks!
    Best, Jenny

  • #2
    In your example, the pweight value for a given observation should contain the number of individuals in the population represented by that observation. Your sample contains 331 people, so the pweight variable should be computed as:

    Code:
    gen pop_size=3743 
    gen weight_srs= pop_size/331

    Comment


    • #3
      44% response means 56% non-response and can be bad news if responders & non-responders differ with respect to important characteristics and study variables. .The result will be "non-response bias".

      There are some things that you can do to reduce this bias.

      1. Reweight the responders so that they resemble the original sample with respect to known characteristics, e.g. size.
      This is known as non-response weighting. (Groves et al. 2009 section 10.5, Lohr, 2009 Chapter 7 and Section 8.5).

      2. Reweight the responders so that they resemble the original population with respect to known characteristics. . This is known generically as poststratification, but has extensions when there are multiple criteria. Sample raking is the most common extension. Possibly the easiest to use is survwgt by Nick Winters (SSC).

      3. My strongest suggestion is to take a subsample ( say 1 in 20) of the non-responding schools and make an intensive personalized effort to get information from them. That could tell you much about bias. You could formally combine these with the original responders, a technique known as two-phase sampling See: Lohr, 2009, pp. 336-338, and Chapter 12).

      Your original n of 448 looks too large to me: it is 20% of the population but that is not a good criterion. Standard sample size calculations for SRS (see Lohr, 2009) would surely have led you to a smaller n. Designs other than SRS, especially stratified sampling and sampling with probability proportional to size (PPS) would also further reduce the sample size.

      References

      Groves, Robert M., Floyd J. Fowler, Mick P. Couper, James M. Lepkowski, Eleanor Singer, and Roger Tourangeau. 2009. Survey methodology, Second Edition. Hoboken, N.J.: Wiley, Section 10.5

      Lohr, Sharon L. 2009. Sampling: Design and Analysis. Boston, MA: Cengage Brooks/Cole
      Steve Samuels
      Statistical Consulting
      [email protected]

      Stata 14.2

      Comment


      • #4
        Correction: your original sample size was 748, not 448
        Steve Samuels
        Statistical Consulting
        [email protected]

        Stata 14.2

        Comment


        • #5
          Thank you for all your help. Unfortunately there is a problem with non-response, and will check out the literature you refered to regarding non-response weighting. Some more schools sent in the questionnaire so now are non-response is at 50 percent.

          I have another question. We have also sent out a questionnaire to the teachers within schools who responded. Unfortunately we did not have adresses to the teachers so the principals at the sampled schools had to send us their teachers e-mails. This was the only possible way to get in touch with the teachers. First I was thinking that I should svyset the data from the teacher's questionnaire as a cluster sample. However I am unsure how the survey weights should be specified. In the first stage the SRS weight is the same as above, that is:

          gen pop_size=3743
          gen weight_srs= pop_size/374 //(N/n)
          gen fpc= 374/pop_size //(n/N)
          svyset skolenhetskod [pweight=weight_srs], fpc(fpc)

          (skolenhetskod is my school-id variable)

          But the teacher sample is not a srs of the teachers within the sample schools. We are only interested in teachers who teach in certain grades, and the principals have given us all these teacher's contact information within the schools. The response rate is rather high among the surveyed teachers. Is it still a cluster sample? How can I specify my survey weights in the second stage?

          Best,
          Jenny

          Comment


          • #6
            Since you have all the teachers of interest in the schools, there is no sampling of teachers witthin schools. You still have a cluster sample and the teachers inherit the school's sampling weight.
            The fpc should be used only for estimating descriptive statistics of the schools/teachers. If you plan on any hypothesis testing or modeling, omit the fpc. For the reasoning, see this post.

            However I would omit the fpc even for descriptive analyses. The theory of the fpc assumes tthat the data are a simple random sample (or close to it). Your now ~165 respondents are not by themselves a simple random sample of the original population. I admit that many analysts apply the fpc in these situations, but I think that they are wrong to do so.
            Steve Samuels
            Statistical Consulting
            [email protected]

            Stata 14.2

            Comment


            • #7
              Dear Steve Samuels,Thank you very much!! I have learned a lot from your posts and reply.
              With Best Wishes,Hassen

              Comment

              Working...
              X