Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create a panel data from 2 waves

    Dear all,

    I am trying to create a panel data from 2 waves (2010 and 2012) of a household survey however I could not make it properly. Followings are my example data and attempts. Any suggestion is greatly appreciated.

    In both data sets, pro (province); dis (district); comm (commune); EA (enumeration area); hhid (household ID); and idmem (ID member) uniquely identify observations.

    In the data set in 2012, pro2010, dis2010, comm2010, EA2010 and hhid2010 are used to identify household observed in 2010.

    Data in 2012
    Code:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte pro int dis long comm int EA byte hhid float idmem int age byte(gender pro2010) int dis2010 long comm2010 int EA2010 byte hhid2010 int age2010 byte gender2010
    1 1   4  8 13 1 77 1 0 .   .  .  .  . .
    1 1   4  8 13 2 70 2 0 .   .  .  .  . .
    1 1   7 22 15 2 60 1 1 1   7 22 15 58 1
    1 1  16  3 15 1 68 2 0 .   .  .  .  . .
    1 1  22 19 13 1 71 2 1 1  22 19 13 69 2
    1 1  28 25 14 1 60 2 0 .   .  .  .  . .
    1 1  34 25 13 1 70 1 1 1  34 25 13 67 1
    1 1  34 25 13 2 67 2 1 1  34 25 13 65 2
    1 2  40 16 13 2 62 1 0 .   .  .  .  . .
    1 2  40 16 14 1 78 2 0 .   .  .  .  . .
    1 2  55 11 13 1 70 1 1 2  55 11 13 68 1
    1 2  55 11 13 2 70 2 1 2  55 11 13 68 2
    1 2  55 11 15 1 61 1 1 2  55 11 15 59 1
    1 2  55 11 15 2 60 2 1 2  55 11 15 58 2
    1 2  67 16 14 4 62 1 0 .   .  .  .  . .
    1 2  67 16 14 5 66 2 0 .   .  .  .  . .
    1 3  91  6 15 1 68 1 0 .   .  .  .  . .
    1 3  91  6 15 2 63 2 0 .   .  .  .  . .
    1 3 106 13 15 1 82 2 1 3 106 13 15 80 2
    1 3 112 17 15 1 72 2 0 .   .  .  .  . .
    1 4 118 39 15 1 84 1 1 4 118 39 15 82 1
    1 4 118 39 15 2 82 2 1 4 118 39 15 80 2
    1 4 124 22 13 1 88 2 0 .   .  .  .  . .
    1 4 124 22 13 2 86 1 0 .   .  .  .  . .
    1 4 124 22 15 1 81 2 0 .   .  .  .  . .
    1 4 124 22 19 1 70 2 0 .   .  .  .  . .
    1 4 133 30 14 5 81 2 1 4 133 30 14  . .
    1 4 133 30 15 1 60 1 1 4 133 30 15 58 1
    1 4 139 30 15 1 65 1 0 .   .  .  .  . .
    1 4 148  6 15 1 62 1 1 4 148  6 15 60 1
    end
    label values gender gender
    label def gender 1 "Male", modify
    label def gender 2 "Female", modify
    label values gender2010 gender2010
    label def gender2010 1 "Male", modify
    label def gender2010 2 "Female", modify
    Data 2010
    Code:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int(pro dis) float comm int(EA hhid) byte idmem int age byte gender
    1 1   4 12 13 1 60 2
    1 1   4 12 14 1 61 1
    1 1   4 12 15 1 69 1
    1 1   4 12 15 2 66 2
    1 1  22 19 13 1 69 2
    1 1  28 20 13 2 60 1
    1 1  28 20 15 1 68 2
    1 1  28 20 15 2 72 1
    1 1  28 20 19 2 60 1
    1 1  34 25 13 1 67 1
    1 1  34 25 13 2 65 2
    1 2  40  6 14 1 64 2
    1 2  40  6 14 2 64 1
    1 2  55 11 13 1 68 1
    1 2  55 11 13 2 68 2
    1 2  67 23 15 1 90 1
    1 2  67 23 15 2 92 2
    1 3  91  6 13 1 64 1
    1 3  91  6 13 2 62 2
    1 3  91  6 14 1 66 2
    1 3  91  6 14 2 67 1
    1 3 106 13 15 1 80 2
    1 3 112 20 15 5 90 2
    1 3 112 20 20 1 60 1
    1 4 118 39 15 1 81 1
    1 4 118 39 15 2 80 2
    1 4 124 39 15 1 63 2
    1 4 124 39 15 2 71 1
    1 4 139 50 13 4 66 2
    1 4 139 50 19 1 71 2
    end
    label values gender gender
    label def gender 1 "Male", modify
    label def gender 2 "Female", modify
    My first attempt is as follows but unfortunately error message of uniquely identify observation issue arises even I tried to use -merge- m:1 or 1:m
    Code:
    use data2012, clear
        ren pro pro2012
        ren dis dis2012  
        ren comm comm2012  
        ren EA EA2012  
        ren hhid hhid2012
    
        ren pro2010 pro
        ren dis2010 dis
        ren comm2010 comm
        ren EA2010 EA
        ren hhid2010 hhid
        sort pro dis comm EA hhid
    merge 1:1 using data1
    My 2nd attempt, however not an effective way, is to rename age (e.g., age10 and age12) and gender (e.g., gen10 and gen12) to make them different names in the both data sets and then use following command:

    Code:
    use data2012
    sort pro dis comm EA hhid idmem
    merge 1:1 pro dis comm EA hhid idmem using data2010
    This code worked but I think I am unable to do it manually with hundreds of variables. Moreover, when I tried to use reshape command to make a panel data, results did not show me age and gender for each year.

    Thank you.

    DL

  • #2
    This is not a job for -merge-. To create a panel data set you want to -append- these two data sets. First you have to harmonize their variable names.

    I don't understand how this data set is organized, and apparently you are just as mystified as I am.

    One would ordinarily expect that within each wave, hhid and idmem would uniquely identify observations. But they do not do that in either wave. If the only way you can uniquely identify the observations is with the use of all of the variables, then you actually don't have any unique identifiers at all and it becomes completely unclear what each observation in this data set refers to. I suspect that somewhere in your data set there is another variable, or perhaps a small number of variables, that together with hhid and idmem provide unique identification of observations in each of the two data sets. You need to find them. If you check the documentation that came with these data sets, it is probably explained somewhere in there. But you won't be able to make any progress at all until you find out what those variables are.

    (Then, after that, you will still have a lot of work cut out for you, because it appears that the identifiers used in 2010 for the same people differ from those used in 2012 and there is something of a crosswalk for that built into the 2012 data set. Working with that will be awkward, but doable. But you can't work on that until the unique identification issue is resolved.)

    Comment


    • #3
      Hello Clyde I have a problem with matching data across waves. The data I have is looking at households which split or migrated from 2010 to 2013. The original HH in 2010 could split to anywhere between 2 to 3 newer households in 2013. some HH had no split. Also there is a variable that indicates whether a HH stayed the same, split or migrated.
      There is also a unique household ID that maps each household to the original HH in 2010. Finally there is a variable that describes whether a household is in a rural or urban area.My aim is to show for those that split or migrated, whether their split was rural-rural; rural-urban; urban-rural; or urban-urban.
      The data will look like this
      HHID Year Rural/Urban HH Decision
      1 2010 Rural Stayed
      2 2010 Urban Stayed
      3 2010 Rural Stayed
      1 2013 Rural Migrated
      1 2013 Urban Split
      1 2013 Rural Stayed
      2 2013 Urban Migrated
      3 2013 Urban Migrated
      3 2013 Rural Stayed


      Note that all 2010 observations stayed because 2010 was the baseline. After then some HH split, some migrated, some stayed. From an initial HH more than one new HH may be born with some staying back and some migrating as is the case with observation 1. Some HH may migrate entirely as is the case with observation 2. My problem is showing how many rural-rural, rural-urban etc. splits or migration occured. I am not interested in cases where was no split as with the second observation of HH 3 in 2013.

      Comment


      • #4
        Originally posted by Clyde Schechter View Post
        This is not a job for -merge-. To create a panel data set you want to -append- these two data sets. First you have to harmonize their variable names.

        I don't understand how this data set is organized, and apparently you are just as mystified as I am.

        One would ordinarily expect that within each wave, hhid and idmem would uniquely identify observations. But they do not do that in either wave. If the only way you can uniquely identify the observations is with the use of all of the variables, then you actually don't have any unique identifiers at all and it becomes completely unclear what each observation in this data set refers to. I suspect that somewhere in your data set there is another variable, or perhaps a small number of variables, that together with hhid and idmem provide unique identification of observations in each of the two data sets. You need to find them. If you check the documentation that came with these data sets, it is probably explained somewhere in there. But you won't be able to make any progress at all until you find out what those variables are.

        (Then, after that, you will still have a lot of work cut out for you, because it appears that the identifiers used in 2010 for the same people differ from those used in 2012 and there is something of a crosswalk for that built into the 2012 data set. Working with that will be awkward, but doable. But you can't work on that until the unique identification issue is resolved.)
        Hello Clyde I have a problem with matching data across waves. The data I have is looking at households which split or migrated from 2010 to 2013. The original HH in 2010 could split to anywhere between 2 to 3 newer households in 2013. some HH had no split. Also there is a variable that indicates whether a HH stayed the same, split or migrated.
        There is also a unique household ID that maps each household to the original HH in 2010. Finally there is a variable that describes whether a household is in a rural or urban area.My aim is to show for those that split or migrated, whether their split was rural-rural; rural-urban; urban-rural; or urban-urban.
        The data will look like this
        HHID Year Rural/Urban HH Decision
        1 2010 Rural Stayed
        2 2010 Urban Stayed
        3 2010 Rural Stayed
        1 2013 Rural Migrated
        1 2013 Urban Split
        1 2013 Rural Stayed
        2 2013 Urban Migrated
        3 2013 Urban Migrated
        3 2013 Rural Stayed


        Note that all 2010 observations stayed because 2010 was the baseline. After then some HH split, some migrated, some stayed. From an initial HH more than one new HH may be born with some staying back and some migrating as is the case with observation 1. Some HH may migrate entirely as is the case with observation 2. My problem is showing how many rural-rural, rural-urban etc. splits or migration occured. I am not interested in cases where was no split as with the second observation of HH 3 in 2013.

        Comment

        Working...
        X