Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Appending two survey data sets

    For my research I need to pool 2 datasets for the same country but for 2 different years.
    I'm planning to use svy.
    But I have an issue with identifying the PSUs. As for the strata the number and location of regions is the same for both datasets, but as for PSU the numbers are different and the PSUs are different. In dataset for year 1 there are, say, 600 PSUs while in dataset for year 2 there are 800 PSUs.
    How to define PSU in appended datasets? I've read about super_stratum which is
    egen super_strata = group (year region residence_of_region) which is not necessary in my case since the regions are the same.
    As for PSU it would be

    egen psu = group (year cluster)

    But this will bring the combined number of PSUs.

    I don't fully understand which path should I follow and if it would be correct to have combined number of PSUs...

  • #2
    Once you have identified the "super-stratum" that includes year, then you need do nothing more about the PSUs. PSUs are automatically nested in strata, So, for example, even if the PSU numbers are the same in two years, Stata will not assume they are the same unit. Therefore, your statement
    Code:
    gen psu = group(year cluster)
    is not necessary if you have properly first issued the command
    Code:
    egen super_strat = group(year region)
    You need to do this, as separate samples were taken in each region and each year. Using your generic variable name "cluster" for the PSU variable in each year, the svyset statement should be:
    Code:
    svyset cluster [pw = your weight] , strata( super_stratum] .
    Some questions:
    • Are the years sequential (e.g. 2015 and 2016?)
    • Do the weights roughly sum to the population total in each year?
    • Are the weights post-stratified to so that sample proportions for demographic factors match external population figures?
    • What is the goal of your study?
    Last edited by Steve Samuels; 25 Jul 2018, 06:43.
    Steve Samuels
    Statistical Consulting
    [email protected]

    Stata 14.2

    Comment


    • #3
      Originally posted by Steve Samuels View Post
      Once you have identified the "super-stratum" that includes year, then you need do nothing more about the PSUs. PSUs are automatically nested in strata, So, for example, even if the PSU numbers are the same in two years, Stata will not assume they are the same unit. Therefore, your statement
      Code:
      gen psu = group(year cluster)
      is not necessary if you have properly first issued the command
      Code:
      egen super_strat = group(year region)
      You need to do this, as separate samples were taken in each region and each year. Using your generic variable name "cluster" for the PSU variable in each year, the svyset statement should be:
      Code:
      svyset cluster [pw = your weight] , strata( super_stratum] .
      Some questions:
      • Are the years sequential (e.g. 2015 and 2016?)
      • Do the weights roughly sum to the population total in each year?
      • Are the weights post-stratified to so that sample proportions for demographic factors match external population figures?
      • What is the goal of your study?
      Thank you for your reply, Steve.

      Here are the answers to your questions.
      1. Years are not sequential. I have datasets for 2006 and 2015.
      2. I believe so.
      3. Yes.
      4. I apologize if my explanation is weak since I'm not good at statistics yet. The study explores risk factors of child malnutrition. The goal of this particular part of study is to compare the OR of the year variable: to take, say, 2006 as a reference year and to provide OR for being malnourished in year 2015, using multilevel analysis.

      Comment


      • #4
        Thanks for responding, Kamola. I see no problems so far. To start on the multilevel model, see the Survey Analysis, section on the Manual entry for meglm. If you have questions about the model, begin a new topic.
        Steve Samuels
        Statistical Consulting
        [email protected]

        Stata 14.2

        Comment


        • #5
          Originally posted by Steve Samuels View Post
          Thanks for responding, Kamola. I see no problems so far. To start on the multilevel model, see the Survey Analysis, section on the Manual entry for meglm. If you have questions about the model, begin a new topic.
          Thank you for your response, Steve.

          Comment

          Working...
          X