Match different ID variables

Andreas Knabe

Join Date: May 2020

Posts: 19
#1

Match different ID variables

18 Apr 2021, 14:28

Hello,
I cannot find a solution for this by myself: I have panel data, several waves with unique IDs and an ID that refers to the previous wave. So if I append wave 2 to wave 1, I have a common ID for both. But if I append wave 3, I only have a common ID for these two waves, and so on. My question is whether I can use ID_2, ID_3 and so forth to assign a value to ID_1 or whether I need to create a new identifier? It looks like this:

wave ID_1 ID_2 ID_3

1 1

1 2

1 3

2 1 x

2 2 y

2 3 z

3 x 1-1

3 y 2-1

3 z 3-1

Thank you in advance!
Andreas Knabe
Tags: panel, panel data
Sergiy Radyakin

Join Date: Apr 2014

Posts: 1867
#2

19 Apr 2021, 04:14

Andreas Knabe, if the IDs were assigned uniquely (so that each person encountered in the survey is getting a unique number, which is same, regardless of the wave), then you'd only have one variable for IDs. But you do have many, so it is probably the case where the IDs are issued uniquely only within each wave.

Consider the trivial case of 1 observation per wave:
- in wave 1 person A was surveyed and assigned ID=1
- in wave 2 person B was surveyed and assigned ID=1

Within each wave the numbering is identifying, and since person B was not surveyed in wave 1, his variable carrying previous ID is missing. (Common for rotating sample, boosted sample, etc situations).

If you wish to distinguish between persons A and B globally (across all waves), then using original IDs is not sufficient, and you will have to generate new IDs.

The rest depends on the exact data structure, of which I am still not clear. Such as, if A is observed in waves 1,2,7 and 8. How can you tell that it is the same person? If you only have two variables (current_id and previous_id) then you can't tell for sure, because of the gap in observations.

If this is a standard/popular dataset, such as the German Socio-Economic Panel (GSOEP), then there are already prepared variables for identification of persons and households across the waves. They may or may not be shipped with the dataset you are working with, but should be available (afaik) from the data producer (e.g. DIW for GSOEP).

Hope this helps.
Best, Sergiy Radyakin
1 like
Comment
Andreas Knabe

Join Date: May 2020

Posts: 19
#3

19 Apr 2021, 09:53

Sergiy Radyakin ,

thank you for your answer. The panel is balanced, so there are no gaps in the observations. Merging the sets also works well, but would not give me the panel structure I need.
I use household data, but some split up, so that the total number of households increases over time.
Abstracting from this, below is an example of a single household, hhid_2010 just adds "xx" to the hhid_2008, based on the number of households emerging from the 2008 value. But hhid_2012 uses a very different identifier. My problem is now to derive from those three identifiers a unique one for 2008, 2010, 2012, etc.

Best,
Andreas

Last edited by Andreas Knabe; 19 Apr 2021, 10:04.
Comment

Sergiy Radyakin

Join Date: Apr 2014
Posts: 1867

19 Apr 2021, 14:57

Code:

clear all
input year strL hhid_2008 strL hhid_2010 strL hhid_2012 strL hhid_2014
2008 "1" ""  ""    ""
2010 "1" "2" ""    ""
2012 ""  "2" "333" ""
2014 ""  ""  "333" "777"
2008 "7" ""  ""    ""
2010 "8" "9" ""    ""
2012 ""  "9" "444" ""
2014 ""  ""  "444" "888"
end

list

foreach y in 2014 2012 2010 {
    foreach p in 2014 2012 2010 2008 {
        if `p'>=`y' continue
        quietly replace hhid_`y' = hhid_`y'[_n+1] if year==`p'
    }
}

foreach y in 2008 2010 2012 {
    quietly replace hhid_`y'=hhid_`y'[_n-1] if year>`y'
}

list, sepby(hhid_2014)

Code:

     +--------------------------------------------------+
     | year   hhi~2008   hhi~2010   hhi~2012   hhi~2014 |
     |--------------------------------------------------|
  1. | 2008          1          2        333        777 |
  2. | 2010          1          2        333        777 |
  3. | 2012          1          2        333        777 |
  4. | 2014          1          2        333        777 |
     |--------------------------------------------------|
  5. | 2008          7          9        444        888 |
  6. | 2010          7          9        444        888 |
  7. | 2012          7          9        444        888 |
  8. | 2014          7          9        444        888 |
     +--------------------------------------------------+

Comment

Andreas Knabe

Join Date: May 2020

Posts: 19
#5

21 Apr 2021, 11:03

Sergiy Radyakin

thank you a lot for this. I appreciate it.
Comment

wave	ID_1	ID_2	ID_3
1	1
1	2
1	3
2	1	x
2	2	y
2	3	z
3		x	1-1
3		y	2-1
3		z	3-1

Announcement

Match different ID variables

Comment

Comment

Comment

Comment