Create a panel data from 2 waves

Dung Le

Join Date: May 2018
Posts: 120

Create a panel data from 2 waves

13 Jul 2018, 09:35

Dear all,

I am trying to create a panel data from 2 waves (2010 and 2012) of a household survey however I could not make it properly. Followings are my example data and attempts. Any suggestion is greatly appreciated.

In both data sets, pro (province); dis (district); comm (commune); EA (enumeration area); hhid (household ID); and idmem (ID member) uniquely identify observations.

In the data set in 2012, pro2010, dis2010, comm2010, EA2010 and hhid2010 are used to identify household observed in 2010.

Data in 2012

Code:


	Code:
	* Example generated by -dataex-. To install: ssc install dataex
clear
input byte pro int dis long comm int EA byte hhid float idmem int age byte(gender pro2010) int dis2010 long comm2010 int EA2010 byte hhid2010 int age2010 byte gender2010
1 1   4  8 13 1 77 1 0 .   .  .  .  . .
1 1   4  8 13 2 70 2 0 .   .  .  .  . .
1 1   7 22 15 2 60 1 1 1   7 22 15 58 1
1 1  16  3 15 1 68 2 0 .   .  .  .  . .
1 1  22 19 13 1 71 2 1 1  22 19 13 69 2
1 1  28 25 14 1 60 2 0 .   .  .  .  . .
1 1  34 25 13 1 70 1 1 1  34 25 13 67 1
1 1  34 25 13 2 67 2 1 1  34 25 13 65 2
1 2  40 16 13 2 62 1 0 .   .  .  .  . .
1 2  40 16 14 1 78 2 0 .   .  .  .  . .
1 2  55 11 13 1 70 1 1 2  55 11 13 68 1
1 2  55 11 13 2 70 2 1 2  55 11 13 68 2
1 2  55 11 15 1 61 1 1 2  55 11 15 59 1
1 2  55 11 15 2 60 2 1 2  55 11 15 58 2
1 2  67 16 14 4 62 1 0 .   .  .  .  . .
1 2  67 16 14 5 66 2 0 .   .  .  .  . .
1 3  91  6 15 1 68 1 0 .   .  .  .  . .
1 3  91  6 15 2 63 2 0 .   .  .  .  . .
1 3 106 13 15 1 82 2 1 3 106 13 15 80 2
1 3 112 17 15 1 72 2 0 .   .  .  .  . .
1 4 118 39 15 1 84 1 1 4 118 39 15 82 1
1 4 118 39 15 2 82 2 1 4 118 39 15 80 2
1 4 124 22 13 1 88 2 0 .   .  .  .  . .
1 4 124 22 13 2 86 1 0 .   .  .  .  . .
1 4 124 22 15 1 81 2 0 .   .  .  .  . .
1 4 124 22 19 1 70 2 0 .   .  .  .  . .
1 4 133 30 14 5 81 2 1 4 133 30 14  . .
1 4 133 30 15 1 60 1 1 4 133 30 15 58 1
1 4 139 30 15 1 65 1 0 .   .  .  .  . .
1 4 148  6 15 1 62 1 1 4 148  6 15 60 1
end
label values gender gender
label def gender 1 "Male", modify
label def gender 2 "Female", modify
label values gender2010 gender2010
label def gender2010 1 "Male", modify
label def gender2010 2 "Female", modify

Data 2010

Code:


	Code:
	* Example generated by -dataex-. To install: ssc install dataex
clear
input int(pro dis) float comm int(EA hhid) byte idmem int age byte gender
1 1   4 12 13 1 60 2
1 1   4 12 14 1 61 1
1 1   4 12 15 1 69 1
1 1   4 12 15 2 66 2
1 1  22 19 13 1 69 2
1 1  28 20 13 2 60 1
1 1  28 20 15 1 68 2
1 1  28 20 15 2 72 1
1 1  28 20 19 2 60 1
1 1  34 25 13 1 67 1
1 1  34 25 13 2 65 2
1 2  40  6 14 1 64 2
1 2  40  6 14 2 64 1
1 2  55 11 13 1 68 1
1 2  55 11 13 2 68 2
1 2  67 23 15 1 90 1
1 2  67 23 15 2 92 2
1 3  91  6 13 1 64 1
1 3  91  6 13 2 62 2
1 3  91  6 14 1 66 2
1 3  91  6 14 2 67 1
1 3 106 13 15 1 80 2
1 3 112 20 15 5 90 2
1 3 112 20 20 1 60 1
1 4 118 39 15 1 81 1
1 4 118 39 15 2 80 2
1 4 124 39 15 1 63 2
1 4 124 39 15 2 71 1
1 4 139 50 13 4 66 2
1 4 139 50 19 1 71 2
end
label values gender gender
label def gender 1 "Male", modify
label def gender 2 "Female", modify

My first attempt is as follows but unfortunately error message of uniquely identify observation issue arises even I tried to use -merge- m:1 or 1:m

Code:

use data2012, clear
    ren pro pro2012
    ren dis dis2012  
    ren comm comm2012  
    ren EA EA2012  
    ren hhid hhid2012

    ren pro2010 pro
    ren dis2010 dis
    ren comm2010 comm
    ren EA2010 EA
    ren hhid2010 hhid
    sort pro dis comm EA hhid
merge 1:1 using data1

My 2nd attempt, however not an effective way, is to rename age (e.g., age10 and age12) and gender (e.g., gen10 and gen12) to make them different names in the both data sets and then use following command:

Code:

use data2012
sort pro dis comm EA hhid idmem
merge 1:1 pro dis comm EA hhid idmem using data2010

This code worked but I think I am unable to do it manually with hundreds of variables. Moreover, when I tried to use reshape command to make a panel data, results did not show me age and gender for each year.

Thank you.

DL

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30064
#2

13 Jul 2018, 13:25

This is not a job for -merge-. To create a panel data set you want to -append- these two data sets. First you have to harmonize their variable names.

I don't understand how this data set is organized, and apparently you are just as mystified as I am.

One would ordinarily expect that within each wave, hhid and idmem would uniquely identify observations. But they do not do that in either wave. If the only way you can uniquely identify the observations is with the use of all of the variables, then you actually don't have any unique identifiers at all and it becomes completely unclear what each observation in this data set refers to. I suspect that somewhere in your data set there is another variable, or perhaps a small number of variables, that together with hhid and idmem provide unique identification of observations in each of the two data sets. You need to find them. If you check the documentation that came with these data sets, it is probably explained somewhere in there. But you won't be able to make any progress at all until you find out what those variables are.

(Then, after that, you will still have a lot of work cut out for you, because it appears that the identifiers used in 2010 for the same people differ from those used in 2012 and there is something of a crosswalk for that built into the 2012 data set. Working with that will be awkward, but doable. But you can't work on that until the unique identification issue is resolved.)
1 like
Comment
Eddie Edo

Join Date: Oct 2019

Posts: 7
#3

19 Oct 2019, 20:31

Hello Clyde I have a problem with matching data across waves. The data I have is looking at households which split or migrated from 2010 to 2013. The original HH in 2010 could split to anywhere between 2 to 3 newer households in 2013. some HH had no split. Also there is a variable that indicates whether a HH stayed the same, split or migrated.
There is also a unique household ID that maps each household to the original HH in 2010. Finally there is a variable that describes whether a household is in a rural or urban area.My aim is to show for those that split or migrated, whether their split was rural-rural; rural-urban; urban-rural; or urban-urban.
The data will look like this
HHID Year Rural/Urban HH Decision
1 2010 Rural Stayed
2 2010 Urban Stayed
3 2010 Rural Stayed
1 2013 Rural Migrated
1 2013 Urban Split
1 2013 Rural Stayed
2 2013 Urban Migrated
3 2013 Urban Migrated
3 2013 Rural Stayed

Note that all 2010 observations stayed because 2010 was the baseline. After then some HH split, some migrated, some stayed. From an initial HH more than one new HH may be born with some staying back and some migrating as is the case with observation 1. Some HH may migrate entirely as is the case with observation 2. My problem is showing how many rural-rural, rural-urban etc. splits or migration occured. I am not interested in cases where was no split as with the second observation of HH 3 in 2013.
Comment
Eddie Edo

Join Date: Oct 2019

Posts: 7
#4

19 Oct 2019, 20:32

Originally posted by Clyde Schechter View Post

This is not a job for -merge-. To create a panel data set you want to -append- these two data sets. First you have to harmonize their variable names.

I don't understand how this data set is organized, and apparently you are just as mystified as I am.

One would ordinarily expect that within each wave, hhid and idmem would uniquely identify observations. But they do not do that in either wave. If the only way you can uniquely identify the observations is with the use of all of the variables, then you actually don't have any unique identifiers at all and it becomes completely unclear what each observation in this data set refers to. I suspect that somewhere in your data set there is another variable, or perhaps a small number of variables, that together with hhid and idmem provide unique identification of observations in each of the two data sets. You need to find them. If you check the documentation that came with these data sets, it is probably explained somewhere in there. But you won't be able to make any progress at all until you find out what those variables are.

(Then, after that, you will still have a lot of work cut out for you, because it appears that the identifiers used in 2010 for the same people differ from those used in 2012 and there is something of a crosswalk for that built into the 2012 data set. Working with that will be awkward, but doable. But you can't work on that until the unique identification issue is resolved.)

Hello Clyde I have a problem with matching data across waves. The data I have is looking at households which split or migrated from 2010 to 2013. The original HH in 2010 could split to anywhere between 2 to 3 newer households in 2013. some HH had no split. Also there is a variable that indicates whether a HH stayed the same, split or migrated.
There is also a unique household ID that maps each household to the original HH in 2010. Finally there is a variable that describes whether a household is in a rural or urban area.My aim is to show for those that split or migrated, whether their split was rural-rural; rural-urban; urban-rural; or urban-urban.
The data will look like this
HHID Year Rural/Urban HH Decision
1 2010 Rural Stayed
2 2010 Urban Stayed
3 2010 Rural Stayed
1 2013 Rural Migrated
1 2013 Urban Split
1 2013 Rural Stayed
2 2013 Urban Migrated
3 2013 Urban Migrated
3 2013 Rural Stayed

Note that all 2010 observations stayed because 2010 was the baseline. After then some HH split, some migrated, some stayed. From an initial HH more than one new HH may be born with some staying back and some migrating as is the case with observation 1. Some HH may migrate entirely as is the case with observation 2. My problem is showing how many rural-rural, rural-urban etc. splits or migration occured. I am not interested in cases where was no split as with the second observation of HH 3 in 2013.
Comment

Announcement

Create a panel data from 2 waves

Comment

Comment

Comment