Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to identify duplicates in the PSID sibling file

    Dear all,

    I work with the sibling data of the PSID (documentation: https://simba.isr.umich.edu/FIMS/FIMS_UG.pdf). I want to drop duplicate observations.

    In the docu, it says:

    "The resulting output file will be a customized data set fit to your specifications. The sibling pairs will be in duplicate form, where ‘Sibling A’ and ‘Sibling B’ will be listed as two observations, once as AB and again as B-A. This allows each researcher to make analytical decisions as to which individual is the focal individual, and which is the sibling of the focal individual."

    So, lets suppose my data is structured as follows:

    Code:
    clear
    
    input long pid long SIBNUM long pids long family
    1 1 2 1
    2 1 1 1
    33 1 34 44
    34 1 33 44
    33 2 35 44
    34 2 35 44
    35 1 33 44
    35 2 34 44
    end
    With pid being the personal identifier. pids is the personal identifier of the respective sibling. SIBNUM is the number of the sibling. Family is an family identificator, based on pid and psid.
    At the end, I want to have the sibling pair 33-34 only once, whereas it exists twice in the sibling data.

    Any suggestion on how to implement is highly appreciated. Thank you very much.

    Best

    Daniel

  • #2
    Code:
    gen sib1 = min(pid, pids)
    gen sib2 = max(pid, pids)
    duplicates drop family sib1 sib2, force
    WARNING: You did not specify which of the duplicate records for a given sibling pair you want to retain. The above code makes an arbitrary selection, and it is not reproducible from one occasion to the next. This is fine if the variables such as SIBNUM that differ within the original pair of records are no longer needed for your purposes. Otherwise, it poses a significant problem, that requires you to identify a rule for selecting which record to retain, or rethinking the entire process.

    Comment

    Working...
    X