Appending two survey data sets

Kamola Babamuradova

Join Date: Jan 2018

Posts: 20
#1

Appending two survey data sets

25 Jul 2018, 02:09

For my research I need to pool 2 datasets for the same country but for 2 different years.
I'm planning to use svy.
But I have an issue with identifying the PSUs. As for the strata the number and location of regions is the same for both datasets, but as for PSU the numbers are different and the PSUs are different. In dataset for year 1 there are, say, 600 PSUs while in dataset for year 2 there are 800 PSUs.
How to define PSU in appended datasets? I've read about super_stratum which is
egen super_strata = group (year region residence_of_region) which is not necessary in my case since the regions are the same.
As for PSU it would be

egen psu = group (year cluster)

But this will bring the combined number of PSUs.

I don't fully understand which path should I follow and if it would be correct to have combined number of PSUs...
Tags: appending datasets, svyset
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#2

25 Jul 2018, 06:41

Once you have identified the "super-stratum" that includes year, then you need do nothing more about the PSUs. PSUs are automatically nested in strata, So, for example, even if the PSU numbers are the same in two years, Stata will not assume they are the same unit. Therefore, your statement

Code:

gen psu = group(year cluster)

is not necessary if you have properly first issued the command

Code:

egen super_strat = group(year region)

You need to do this, as separate samples were taken in each region and each year. Using your generic variable name "cluster" for the PSU variable in each year, the svyset statement should be:

Code:

svyset cluster [pw = your weight] , strata( super_stratum] .

Some questions:
• Are the years sequential (e.g. 2015 and 2016?)
• Do the weights roughly sum to the population total in each year?
• Are the weights post-stratified to so that sample proportions for demographic factors match external population figures?
• What is the goal of your study?

Last edited by Steve Samuels; 25 Jul 2018, 06:43.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Kamola Babamuradova

Join Date: Jan 2018

Posts: 20
#3

25 Jul 2018, 22:12

Originally posted by Steve Samuels View Post

Once you have identified the "super-stratum" that includes year, then you need do nothing more about the PSUs. PSUs are automatically nested in strata, So, for example, even if the PSU numbers are the same in two years, Stata will not assume they are the same unit. Therefore, your statement

Code:

gen psu = group(year cluster)

is not necessary if you have properly first issued the command

Code:

egen super_strat = group(year region)

You need to do this, as separate samples were taken in each region and each year. Using your generic variable name "cluster" for the PSU variable in each year, the svyset statement should be:

Code:

svyset cluster [pw = your weight] , strata( super_stratum] .

Some questions:
• Are the years sequential (e.g. 2015 and 2016?)
• Do the weights roughly sum to the population total in each year?
• Are the weights post-stratified to so that sample proportions for demographic factors match external population figures?
• What is the goal of your study?

Thank you for your reply, Steve.

Here are the answers to your questions.
1. Years are not sequential. I have datasets for 2006 and 2015.
2. I believe so.
3. Yes.
4. I apologize if my explanation is weak since I'm not good at statistics yet. The study explores risk factors of child malnutrition. The goal of this particular part of study is to compare the OR of the year variable: to take, say, 2006 as a reference year and to provide OR for being malnourished in year 2015, using multilevel analysis.
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#4

26 Jul 2018, 07:04

Thanks for responding, Kamola. I see no problems so far. To start on the multilevel model, see the Survey Analysis, section on the Manual entry for meglm. If you have questions about the model, begin a new topic.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Kamola Babamuradova

Join Date: Jan 2018

Posts: 20
#5

26 Jul 2018, 21:50

Originally posted by Steve Samuels View Post

Thanks for responding, Kamola. I see no problems so far. To start on the multilevel model, see the Survey Analysis, section on the Manual entry for meglm. If you have questions about the model, begin a new topic.

Thank you for your response, Steve.
Comment

Announcement

Appending two survey data sets

Comment

Comment

Comment

Comment