Data Management

Vivian Phan

Join Date: Jun 2016

Posts: 30
#1

Data Management

17 Feb 2017, 12:07

Dear all,

I don't know how to combine two cross-sectional datasets.
Here is my case

I have Household Survey data, particularly 6 files for 6 years.
I want to build panel data from 6 cross-sectional files, but the problem that the observations are not repeated during 6 years. A small proportion of the sample can be build into panel data. Furthermore, the questionnaires are not exactly the same.

In detail
First file Y2014, the display is shown as followed
MemberID HouseholdID v1 v2 v3 v4 DummyVar HouseholdID2013 **Note: DummyVar captures the household that was interviewed in the previous survey
123451 12345 ... ... ... ... 0 -
123452 12345 ... ... ... ... 0 -
123461 12346 ... ... ... ... 1 12346001
123462 12346 .. ... ... ... 1 12346001
123463 12346 ... ... ... ... 1 12346001
123464 12346 ... ... ... ... 1 12346001
123471 12346 ... ... ... ... 1 12347001
123472 12346 .. ... ... ... 1 12347001

The next file Y2013 is shown as following
MemberID HouseholdID v1 v2 v3 v4 v5 v6 DummyVar HouseholdID2012 **Note: DummyVar captures the household that was interviewed in the previous survey
102010001 10201000 ... ... ... ... 0 -
102010002 10201000 ... ... ... ... 0 -
123460011 12346001 ... ... ... ... 1 12346001
123460012 12346001 .. ... ... ... 1 12346001
123460013 12346001 ... ... ... ... 1 12346001
123470011 12347001 ... ... ... ... 0 -
123470012 12347001 .. ... ... ... 0 -

Similarly, I have some files back to 2006.
For this example above, only the household in bold is interviewed for two years, so that I can build panel data.

One further question: This household in Y2013 had 3 members but Y2014 had 4 members with the newly-born kid, for example. Or in some cases, the number of members in panel data reduces as a result of any reason but the each member doesn't have his/her own memberID because it is ordered ordinarily. So, I cannot capture who is missing or who is adding. Pooling cross-sectional data may be the best solution, is it right? And how can I do it?

Thank you so much

Best regards
Tags: None
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#2

20 Feb 2017, 11:46

You didn't get a quick answer. You'd have a better chance if you provided your data code (using code delimiters), Stata output, and example data (using dataex). See the FAQ on asking questions.

I (and I suspect others) have difficulty knowing what you really want. If you simply want to stack the yearly data, you can put a year identifier into each data set, and then just append them - look up append in the manuals.

It looks like you have panel data - multiple households and members observed in multiple years. So, it is almost certain that the long format (what you'd want for a panel analysis) is the best way to go. If you have the data set up into one long data set, then you can answer your questions. You should be able to use generate or egen by groups to answer your questions - changes in family size, etc..

You're going to have to tell us what the unit of observation will be for your analysis. Is the observation the family-year or is it the member-year (where multiple members can be within a family) or what? With a clearer idea of what you want, we can be of more assistance.
1 like
Comment
Vivian Phan

Join Date: Jun 2016

Posts: 30
#3

23 Feb 2017, 06:27

Thank you, Phil

-append syntax may help me out.
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#4

23 Feb 2017, 06:35

Being this so, you just need to type:

Code:

. help append

And see several examples worth reading and testing whether the command works in your case.

Best regards,

Marcos
Comment

Announcement

Comment

Comment

Comment