merge several datasets using an ¨"linking file"

Jordi Josep

Join Date: Apr 2017

Posts: 3
#1

merge several datasets using an ¨"linking file"

01 May 2017, 13:55

Dear Stata Forum,

I need to merge several datasets that together will make an (unbalanced) panel data, and to do so, I have several ¨"linking files":

On the one hand I have 10 cross-section samples with about 20.000 households each, and only about half of them continue the in survey in the next wave (so that household is tracked up to 2 years). The structure of the waves would be like this:
ID YEAR X1 X2 X3

1 2006 x1 ... ...

2 2006 ... ... ...

3 2006 ... ... ...

4 2006 ... .. ...

and 2007 wave:
ID YEAR X1 X2 X3

1 2007 x1 ... ...

2 2007 ... ... ...

3 2007 ... ... ...

4 2007 ... .. ...

Then, I have 9 linking files that should allow connecting the different waves: this "linking file" connects observations across waves. For instance, the linking file 2006-2006 provide the identification number of 2007 wave and the identification number that corresponds to the previous wave 2006. This is how it look like:

ID_2007 participate_in_next _wave ID_2006

1 0 .

2 1 3

3 1 2

4 0 .

Can any one give some hints or direction to take to make this the more efficient possible?.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30103
#2

01 May 2017, 16:47

So assuming that this linking file contains a whole series of ID variables, ID2006, ID2007, ID2008,...,ID2015 for all ten years of your data, I would do something like this:

Code:

// USE THE LINKING FILE TO CREATE A UNIQUE ID FOR EACH PERSON // THAT WILL APPLY IN ALL YEARS use linking_file, clear gen long unique_id = _n tempfile links save `links' // MERGE THE UNIQUE ID INTO THE YEARLY FILES forvalues y = 2006/2015 { use wave`y', clear rename ID ID`y' merge 1:1 ID`y' using `links', keep(master match) keepusing(unique_id) save linkable_wave_`y', replace }

This will leave you with ten files each of which contains the same data as the original ten files, but with each observation identified by a unique ID that is the same for the same person in whichever files he/she appears in. The next step is to clean those files individually: even the most professionally curated survey data usually contains errors and inconsistencies. It is usually easiest to clean those problems up in the individual files before you try to put them together. Once that is done, you can then -append- all the cleaned files together.
2 likes
Comment

ID	YEAR	X1	X2	X3
1	2006	x1	...	...
2	2006	...	...	...
3	2006	...	...	...
4	2006	...	..	...

ID	YEAR	X1	X2	X3
1	2007	x1	...	...
2	2007	...	...	...
3	2007	...	...	...
4	2007	...	..	...

ID_2007	participate_in_next _wave	ID_2006
1	0	.
2	1	3
3	1	2
4	0	.

Announcement

merge several datasets using an ¨"linking file"

Comment