Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Identifying individuals who are not in the dataset

    Dear all,

    my panel dataset looks like this:
    pid year married partid
    100 1990 1 101
    100 1991 1 101
    100 1992 1 101
    2459 1990 0 .
    2459 1991 1 3456
    2459 1992 1 3456
    59345 1990 1 79856
    59345 1991 1 79856
    59345 1992 0 .

    So If individuals are married, they often have a Partner-ID assigned. Due to some restrictions, some individuals got dropped from the dataset.
    What I want to do know is to set the Partner-ID to missing, if the partner is not in the dataset anymore. So if there is no pid equal to the partner-id of an individual.
    So for example if the individual with the pid 3456 is no longer in the dataset, I would like to set the partner-id of individual 2459 missing.

    Any help would be appreciated. Thank you very much in advance.

  • #2
    In the future, please use dataex to present data examples. You can do this with a couple of merges. You do not have any matches in your example data set, so I modify some -partid-

    Code:
    input float(pid year married partid)
    100 1990 1 59345 
    100 1991 1 59345 
    100 1992 1 59345 
    2459 1990 0 .
    2459 1991 1 3456
    2459 1992 1 3456
    59345 1990 1 100
    59345 1991 1 100
    59345 1992 0 .
    end
    
    tempfile data
    save `data'
    keep pid
    contract pid, nomiss
    drop _freq
    sort pid
    rename pid partid 
    tempfile partid
    save `partid'
    use `data', clear
    keep partid
    contract partid, nomiss
    drop _freq
    sort partid
    merge 1:1 partid using `partid'
    keep if _merge==1
    drop _merge
    tempfile todelete
    save `todelete'
    use `data', clear
    merge m:1 partid using `todelete'
    replace partid=. if _merge==3
    drop _merge

    Comment


    • #3
      You could also use levelsof to get the sets of identifiers and then look for a difference. Could be messy if the numbers of identifiers are rather large.

      Comment


      • #4
        - rangestat- provides a convenient solution.
        Code:
        *ssc install rangestat
        
        mvencode pid, mv(-9999)
        rangestat (first) wanted=pid, interval(pid partid partid)
        mvdecode pid wanted, mv(-9999=.)
        Note that line 2 and line 4 (with mvencode and mvdecode) could be omitted if you could be sure that there are no missing pids.

        Comment

        Working...
        X