Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Match different ID variables

    Hello,
    I cannot find a solution for this by myself: I have panel data, several waves with unique IDs and an ID that refers to the previous wave. So if I append wave 2 to wave 1, I have a common ID for both. But if I append wave 3, I only have a common ID for these two waves, and so on. My question is whether I can use ID_2, ID_3 and so forth to assign a value to ID_1 or whether I need to create a new identifier? It looks like this:
    wave ID_1 ID_2 ID_3
    1 1
    1 2
    1 3
    2 1 x
    2 2 y
    2 3 z
    3 x 1-1
    3 y 2-1
    3 z 3-1
    Thank you in advance!
    Andreas Knabe

  • #2
    Andreas Knabe, if the IDs were assigned uniquely (so that each person encountered in the survey is getting a unique number, which is same, regardless of the wave), then you'd only have one variable for IDs. But you do have many, so it is probably the case where the IDs are issued uniquely only within each wave.

    Consider the trivial case of 1 observation per wave:
    - in wave 1 person A was surveyed and assigned ID=1
    - in wave 2 person B was surveyed and assigned ID=1

    Within each wave the numbering is identifying, and since person B was not surveyed in wave 1, his variable carrying previous ID is missing. (Common for rotating sample, boosted sample, etc situations).

    If you wish to distinguish between persons A and B globally (across all waves), then using original IDs is not sufficient, and you will have to generate new IDs.

    The rest depends on the exact data structure, of which I am still not clear. Such as, if A is observed in waves 1,2,7 and 8. How can you tell that it is the same person? If you only have two variables (current_id and previous_id) then you can't tell for sure, because of the gap in observations.

    If this is a standard/popular dataset, such as the German Socio-Economic Panel (GSOEP), then there are already prepared variables for identification of persons and households across the waves. They may or may not be shipped with the dataset you are working with, but should be available (afaik) from the data producer (e.g. DIW for GSOEP).

    Hope this helps.
    Best, Sergiy Radyakin

    Comment


    • #3
      Sergiy Radyakin ,

      thank you for your answer. The panel is balanced, so there are no gaps in the observations. Merging the sets also works well, but would not give me the panel structure I need.
      I use household data, but some split up, so that the total number of households increases over time.
      Abstracting from this, below is an example of a single household, hhid_2010 just adds "xx" to the hhid_2008, based on the number of households emerging from the 2008 value. But hhid_2012 uses a very different identifier. My problem is now to derive from those three identifiers a unique one for 2008, 2010, 2012, etc.

      Click image for larger version

Name:	Screenshot 2021-04-19 at 18.03.15.png
Views:	1
Size:	39.8 KB
ID:	1604447

      Best,
      Andreas
      Last edited by Andreas Knabe; 19 Apr 2021, 10:04.

      Comment


      • #4
        Code:
        clear all
        input year strL hhid_2008 strL hhid_2010 strL hhid_2012 strL hhid_2014
        2008 "1" ""  ""    ""
        2010 "1" "2" ""    ""
        2012 ""  "2" "333" ""
        2014 ""  ""  "333" "777"
        2008 "7" ""  ""    ""
        2010 "8" "9" ""    ""
        2012 ""  "9" "444" ""
        2014 ""  ""  "444" "888"
        end
        
        list
        
        foreach y in 2014 2012 2010 {
            foreach p in 2014 2012 2010 2008 {
                if `p'>=`y' continue
                quietly replace hhid_`y' = hhid_`y'[_n+1] if year==`p'
            }
        }
        
        foreach y in 2008 2010 2012 {
            quietly replace hhid_`y'=hhid_`y'[_n-1] if year>`y'
        }
        
        list, sepby(hhid_2014)

        Code:
             +--------------------------------------------------+
             | year   hhi~2008   hhi~2010   hhi~2012   hhi~2014 |
             |--------------------------------------------------|
          1. | 2008          1          2        333        777 |
          2. | 2010          1          2        333        777 |
          3. | 2012          1          2        333        777 |
          4. | 2014          1          2        333        777 |
             |--------------------------------------------------|
          5. | 2008          7          9        444        888 |
          6. | 2010          7          9        444        888 |
          7. | 2012          7          9        444        888 |
          8. | 2014          7          9        444        888 |
             +--------------------------------------------------+

        Comment


        • #5
          Sergiy Radyakin

          thank you a lot for this. I appreciate it.

          Comment

          Working...
          X