How to identify households that have changed address from a panel data?

Titir Bhattacharya

Join Date: Mar 2019
Posts: 226

How to identify households that have changed address from a panel data?

17 Jul 2023, 10:37

Hi,

Below is an example data:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(id round location)
1 1 1
1 2 1
1 3 1
1 4 .
1 5 2
2 1 2
2 2 2
2 4 2
2 5 3
3 1 4
3 2 3
3 3 .
3 5 4
4 1 2
4 2 2
4 5 5
end

first, I want to flag observations which change locations between two consecutive rounds. I use the following code for this:

Code:

by id:gen flag=location[_n+1]!=location

next, I want to harmonize the locations across rounds for each id and make it equal to that in the first available round

For this, i try the following code:

Code:

gen location2 = location
capture noisily forval r = 1/5 {
        bysort id :  egen value`r' = mean(cond(round == `r', location, .))
}
replace location2 = cond(value3 < ., value3, cond(value4 < ., value4, value5)) if round >= 1
order location2,a(location)

So I finally have the following data:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(id round location location2 flag value1 value2 value3 value4 value5)
1 1 1 1 0 1 1 1 . 2
1 2 1 1 0 1 1 1 . 2
1 3 1 1 1 1 1 1 . 2
1 4 . 1 1 1 1 1 . 2
1 5 2 1 1 1 1 1 . 2
2 1 2 2 0 2 2 . 2 3
2 2 2 2 0 2 2 . 2 3
2 4 2 2 1 2 2 . 2 3
2 5 3 2 1 2 2 . 2 3
3 1 4 4 1 4 3 . . 4
3 2 3 4 1 4 3 . . 4
3 3 . 4 1 4 3 . . 4
3 5 4 4 1 4 3 . . 4
4 1 2 5 0 2 2 . . 5
4 2 2 5 1 2 2 . . 5
4 5 5 5 1 2 2 . . 5
end

here, only for id=4, it is the location of round 5 that has been repeated across all rounds, while for all other ids, it is the location of the first round. I'm confused why this is happening and would appreciate any solution.

Thanks,

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30165
#2

17 Jul 2023, 11:03

I do not understand what you are doing. If you are trying to find the location from the first available round, it will be, in each case, the location in round 1 in your example data because nobody has a missing location in round 1. Clearly, you anticipate that in your full data set, there will be some people with no location specified in round 1. The following code handles all cases:

Code:

gen byte missing_location = missing(location) by id (missing_location round), sort: gen first_known_location = location[1]
Comment
Titir Bhattacharya

Join Date: Mar 2019

Posts: 226
#3

17 Jul 2023, 11:39

Originally posted by Clyde Schechter View Post

I do not understand what you are doing. If you are trying to find the location from the first available round, it will be, in each case, the location in round 1 in your example data because nobody has a missing location in round 1. Clearly, you anticipate that in your full data set, there will be some people with no location specified in round 1. The following code handles all cases:

Code:

gen byte missing_location = missing(location) by id (missing_location round), sort: gen first_known_location = location[1]

thanks Clyde. My concern is also that some people would not have been interviewed in round 1 and hence not have round 1 in the first place. Wanted to confirm if I can use this code to address that issue as well?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30165
#4

17 Jul 2023, 12:07

Wanted to confirm if I can use this code to address that issue as well?

Yes. This code will work for that as well. It will always give you the location of the earliest round for the id in the data set which has a non-missing location specified.
Comment
Titir Bhattacharya

Join Date: Mar 2019

Posts: 226
#5

17 Jul 2023, 12:20

Originally posted by Clyde Schechter View Post

Yes. This code will work for that as well. It will always give you the location of the earliest round for the id in the data set which has a non-missing location specified.

thanks a lot Clyde, it worked perfectly.
Comment

Announcement

How to identify households that have changed address from a panel data?

Comment

Comment

Comment

Comment