Dear Stata Forum,
I need to merge several datasets that together will make an (unbalanced) panel data, and to do so, I have several ¨"linking files":
On the one hand I have 10 cross-section samples with about 20.000 households each, and only about half of them continue the in survey in the next wave (so that household is tracked up to 2 years). The structure of the waves would be like this:
and 2007 wave:
Then, I have 9 linking files that should allow connecting the different waves: this "linking file" connects observations across waves. For instance, the linking file 2006-2006 provide the identification number of 2007 wave and the identification number that corresponds to the previous wave 2006. This is how it look like:
Can any one give some hints or direction to take to make this the more efficient possible?.
I need to merge several datasets that together will make an (unbalanced) panel data, and to do so, I have several ¨"linking files":
On the one hand I have 10 cross-section samples with about 20.000 households each, and only about half of them continue the in survey in the next wave (so that household is tracked up to 2 years). The structure of the waves would be like this:
ID | YEAR | X1 | X2 | X3 |
1 | 2006 | x1 | ... | ... |
2 | 2006 | ... | ... | ... |
3 | 2006 | ... | ... | ... |
4 | 2006 | ... | .. | ... |
ID | YEAR | X1 | X2 | X3 |
1 | 2007 | x1 | ... | ... |
2 | 2007 | ... | ... | ... |
3 | 2007 | ... | ... | ... |
4 | 2007 | ... | .. | ... |
Then, I have 9 linking files that should allow connecting the different waves: this "linking file" connects observations across waves. For instance, the linking file 2006-2006 provide the identification number of 2007 wave and the identification number that corresponds to the previous wave 2006. This is how it look like:
ID_2007 | participate_in_next _wave | ID_2006 | ||
1 | 0 | . | ||
2 | 1 | 3 | ||
3 | 1 | 2 | ||
4 | 0 | . |
Comment