Dear all,
I am using different waves of the Spanish Survey of Household Finances to create a panel dataset. A nice component of this survey is that about 60% of households are followed in subsequent waves (this are identified by a dummy variable hogarpanel). I want to take advantage of this fact to create a panel dataset, but I have encountered a major problem. Households are uniquely identified by a household ID in each wave (this variable is called h_i for i={2005,2008,2011...}). The problem comes from the fact that the household ID varies for each wave. For example, household x may be h_2011=1, h_2008=3469 and h_2005=12765.
Moreover, for the 2011th wave, I can know h_2011 and h_2008, but not h_2005 (and so on and so forth). The data set would look like this (once I drop all observations that are not part of the panel):
wave h_2011 h_2008 h_2005
2011 1 3469 .
2011 2 1234 .
2011 3 6659 .
2008 . 3469 12765
2008 . 1234 957
2008 . 6659 2482
2005 . . 12765
2005 . . 957
2005 . . 2482
I have found a similar post in the forum: https://www.statalist.org/forums/for...t-id-variables. I have tried the code Sergey's proposed in his reply, adapting it to my needs, but it is not giving me the desired outcome. Indeed, what it has done is just give the same number to all the missing values for h_2005.
I have been thinking for a while on any possible solutions, maybe using merge command or using egen newid but these are not satisfactory either.
If you have any suggestions, they would be highly appreciated. Thanks in advance.
Best regards,
I am using different waves of the Spanish Survey of Household Finances to create a panel dataset. A nice component of this survey is that about 60% of households are followed in subsequent waves (this are identified by a dummy variable hogarpanel). I want to take advantage of this fact to create a panel dataset, but I have encountered a major problem. Households are uniquely identified by a household ID in each wave (this variable is called h_i for i={2005,2008,2011...}). The problem comes from the fact that the household ID varies for each wave. For example, household x may be h_2011=1, h_2008=3469 and h_2005=12765.
Moreover, for the 2011th wave, I can know h_2011 and h_2008, but not h_2005 (and so on and so forth). The data set would look like this (once I drop all observations that are not part of the panel):
wave h_2011 h_2008 h_2005
2011 1 3469 .
2011 2 1234 .
2011 3 6659 .
2008 . 3469 12765
2008 . 1234 957
2008 . 6659 2482
2005 . . 12765
2005 . . 957
2005 . . 2482
I have found a similar post in the forum: https://www.statalist.org/forums/for...t-id-variables. I have tried the code Sergey's proposed in his reply, adapting it to my needs, but it is not giving me the desired outcome. Indeed, what it has done is just give the same number to all the missing values for h_2005.
I have been thinking for a while on any possible solutions, maybe using merge command or using egen newid but these are not satisfactory either.
If you have any suggestions, they would be highly appreciated. Thanks in advance.
Best regards,
Comment