Match IDs for different survey waves

Borja Urrea

Join Date: Jun 2021

Posts: 15
#1

Match IDs for different survey waves

03 Jun 2021, 12:07

Dear all,

I am using different waves of the Spanish Survey of Household Finances to create a panel dataset. A nice component of this survey is that about 60% of households are followed in subsequent waves (this are identified by a dummy variable hogarpanel). I want to take advantage of this fact to create a panel dataset, but I have encountered a major problem. Households are uniquely identified by a household ID in each wave (this variable is called h_i for i={2005,2008,2011...}). The problem comes from the fact that the household ID varies for each wave. For example, household x may be h_2011=1, h_2008=3469 and h_2005=12765.
Moreover, for the 2011th wave, I can know h_2011 and h_2008, but not h_2005 (and so on and so forth). The data set would look like this (once I drop all observations that are not part of the panel):

wave h_2011 h_2008 h_2005
2011 1 3469 .
2011 2 1234 .
2011 3 6659 .
2008 . 3469 12765
2008 . 1234 957
2008 . 6659 2482
2005 . . 12765
2005 . . 957
2005 . . 2482

I have found a similar post in the forum: https://www.statalist.org/forums/for...t-id-variables. I have tried the code Sergey's proposed in his reply, adapting it to my needs, but it is not giving me the desired outcome. Indeed, what it has done is just give the same number to all the missing values for h_2005.
I have been thinking for a while on any possible solutions, maybe using merge command or using egen newid but these are not satisfactory either.

If you have any suggestions, they would be highly appreciated. Thanks in advance.

Best regards,
Tags: loop, matching ID, panel, panel data
Andrew Musau

Join Date: Oct 2014

Posts: 10187
#2

03 Jun 2021, 13:43

Search the forum for mentions of the command group_id from SSC, by Robert Picard. Eventually, if you cannot make progress, provide a usable data example using the dataex command as recommended in FAQ Advice #12.
Comment

Borja Urrea

Join Date: Jun 2021
Posts: 15

05 Jun 2021, 12:07

Dear Andrew,

First of all, sorry for my mess of post. I read the FAQ 12 again and learnt to use dataex, so I hope this post is more clear now. Second, thanks for your input. I have tried group_id and it has worked: what I did was to create a new ID for all observations first and use group_id after that. My way to go has been the following:

Code:

gen id =_n
group_id id, matchby (h_2005)
group_id id, matchby (h_2008)

However, I have another problem now. There are some observations that were included in the 2005th wave that did not appear in subsequent surveys. Therefore I would want to delete these observations. But, since these observations did formed part of 2002nd wave of the survey, simply dropping observations which have panel dummy variable = 0 does not work to get rid of them. I am attaching an example of my data (40 obs).
My goal is to eliminate all the IDs that only contain observations of 2005, for instance, ID=1. Please note it is a multiply imputed dataset (with 5 imputations, hence the six observations per ID and year).

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int(h_2002 h_2005 h_2008 h_2011) byte hogarpanel float(time id)
3277 1496    .    . 1 2005 1
3277 1496    .    . 1 2005 1
3277 1496    .    . 1 2005 1
3277 1496    .    . 1 2005 1
3277 1496    .    . 1 2005 1
3277 1496    .    . 1 2005 1
   . 3349 3908    . 1 2008 2
   . 3349 3908    . 1 2008 2
   .    . 3908 5572 1 2011 2
4284 3349    .    . 1 2005 2
4284 3349    .    . 1 2005 2
4284 3349    .    . 1 2005 2
4284 3349    .    . 1 2005 2
   .    . 3908 5572 1 2011 2
   .    . 3908 5572 1 2011 2
   . 3349 3908    . 1 2008 2
4284 3349    .    . 1 2005 2
   .    . 3908 5572 1 2011 2
   . 3349 3908    . 1 2008 2
   . 3349 3908    . 1 2008 2
   . 3349 3908    . 1 2008 2
   .    . 3908 5572 1 2011 2
   .    . 3908 5572 1 2011 2
4284 3349    .    . 1 2005 2
   . 1411 2808    . 1 2008 3
   . 1411 2808    . 1 2008 3
   . 1411 2808    . 1 2008 3
   .    . 2808 4317 1 2011 3
4822 1411    .    . 1 2005 3
4822 1411    .    . 1 2005 3
   .    . 2808 4317 1 2011 3
4822 1411    .    . 1 2005 3
   . 1411 2808    . 1 2008 3
   . 1411 2808    . 1 2008 3
   . 1411 2808    . 1 2008 3
   .    . 2808 4317 1 2011 3
   .    . 2808 4317 1 2011 3
   .    . 2808 4317 1 2011 3
4822 1411    .    . 1 2005 3
4822 1411    .    . 1 2005 3
end

I thought in using the drop command and writing something like: drop if id=`x' & (time=2005 for all observations such that id=`x') but I do not know how to write the second condition.

Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10187

06 Jun 2021, 11:01

Code:

forval year= 2005(3)2011{
    bys id: egen in`year'= max(!missing(h_`year'))
}

drop if in2005 &!in2008 &!in2011

Res.:

Code:

. l id h_2002- h_2011

     +----------------------------------------+
     | id   h_2002   h_2005   h_2008   h_2011 |
     |----------------------------------------|
  1. |  2        .     3349     3908        . |
  2. |  2        .     3349     3908        . |
  3. |  2        .        .     3908     5572 |
  4. |  2     4284     3349        .        . |
  5. |  2     4284     3349        .        . |
     |----------------------------------------|
  6. |  2     4284     3349        .        . |
  7. |  2     4284     3349        .        . |
  8. |  2        .        .     3908     5572 |
  9. |  2        .        .     3908     5572 |
 10. |  2        .     3349     3908        . |
     |----------------------------------------|
 11. |  2     4284     3349        .        . |
 12. |  2        .        .     3908     5572 |
 13. |  2        .     3349     3908        . |
 14. |  2        .     3349     3908        . |
 15. |  2        .     3349     3908        . |
     |----------------------------------------|
 16. |  2        .        .     3908     5572 |
 17. |  2        .        .     3908     5572 |
 18. |  2     4284     3349        .        . |
 19. |  3        .     1411     2808        . |
 20. |  3        .     1411     2808        . |
     |----------------------------------------|
 21. |  3        .     1411     2808        . |
 22. |  3        .        .     2808     4317 |
 23. |  3     4822     1411        .        . |
 24. |  3     4822     1411        .        . |
 25. |  3        .        .     2808     4317 |
     |----------------------------------------|
 26. |  3     4822     1411        .        . |
 27. |  3        .     1411     2808        . |
 28. |  3        .     1411     2808        . |
 29. |  3        .     1411     2808        . |
 30. |  3        .        .     2808     4317 |
     |----------------------------------------|
 31. |  3        .        .     2808     4317 |
 32. |  3        .        .     2808     4317 |
 33. |  3     4822     1411        .        . |
 34. |  3     4822     1411        .        . |
     +----------------------------------------+

Comment

Borja Urrea

Join Date: Jun 2021

Posts: 15
#5

08 Jun 2021, 03:24

Thanks Andrew, I finally found a workaround.

Best,
Comment

Announcement