Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to identify households that have changed address from a panel data?

    Hi,

    Below is an example data:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(id round location)
    1 1 1
    1 2 1
    1 3 1
    1 4 .
    1 5 2
    2 1 2
    2 2 2
    2 4 2
    2 5 3
    3 1 4
    3 2 3
    3 3 .
    3 5 4
    4 1 2
    4 2 2
    4 5 5
    end
    first, I want to flag observations which change locations between two consecutive rounds. I use the following code for this:

    Code:
    by id:gen flag=location[_n+1]!=location
    next, I want to harmonize the locations across rounds for each id and make it equal to that in the first available round

    For this, i try the following code:

    Code:
    gen location2 = location
    capture noisily forval r = 1/5 {
            bysort id :  egen value`r' = mean(cond(round == `r', location, .))
    }
    replace location2 = cond(value3 < ., value3, cond(value4 < ., value4, value5)) if round >= 1
    order location2,a(location)
    So I finally have the following data:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(id round location location2 flag value1 value2 value3 value4 value5)
    1 1 1 1 0 1 1 1 . 2
    1 2 1 1 0 1 1 1 . 2
    1 3 1 1 1 1 1 1 . 2
    1 4 . 1 1 1 1 1 . 2
    1 5 2 1 1 1 1 1 . 2
    2 1 2 2 0 2 2 . 2 3
    2 2 2 2 0 2 2 . 2 3
    2 4 2 2 1 2 2 . 2 3
    2 5 3 2 1 2 2 . 2 3
    3 1 4 4 1 4 3 . . 4
    3 2 3 4 1 4 3 . . 4
    3 3 . 4 1 4 3 . . 4
    3 5 4 4 1 4 3 . . 4
    4 1 2 5 0 2 2 . . 5
    4 2 2 5 1 2 2 . . 5
    4 5 5 5 1 2 2 . . 5
    end
    here, only for id=4, it is the location of round 5 that has been repeated across all rounds, while for all other ids, it is the location of the first round. I'm confused why this is happening and would appreciate any solution.

    Thanks,

  • #2
    I do not understand what you are doing. If you are trying to find the location from the first available round, it will be, in each case, the location in round 1 in your example data because nobody has a missing location in round 1. Clearly, you anticipate that in your full data set, there will be some people with no location specified in round 1. The following code handles all cases:
    Code:
    gen byte missing_location = missing(location)
    by id (missing_location round), sort: gen first_known_location = location[1]

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      I do not understand what you are doing. If you are trying to find the location from the first available round, it will be, in each case, the location in round 1 in your example data because nobody has a missing location in round 1. Clearly, you anticipate that in your full data set, there will be some people with no location specified in round 1. The following code handles all cases:
      Code:
      gen byte missing_location = missing(location)
      by id (missing_location round), sort: gen first_known_location = location[1]
      thanks Clyde. My concern is also that some people would not have been interviewed in round 1 and hence not have round 1 in the first place. Wanted to confirm if I can use this code to address that issue as well?

      Comment


      • #4
        Wanted to confirm if I can use this code to address that issue as well?
        Yes. This code will work for that as well. It will always give you the location of the earliest round for the id in the data set which has a non-missing location specified.

        Comment


        • #5
          Originally posted by Clyde Schechter View Post
          Yes. This code will work for that as well. It will always give you the location of the earliest round for the id in the data set which has a non-missing location specified.
          thanks a lot Clyde, it worked perfectly.

          Comment

          Working...
          X