Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Detect duplicate information by variable

    Dear All,

    I am using an administrative dataset where persid in individual id, B_sign1 is officer id and B_kontor1 is office id. It is natural that B_sign1 has duplicate values within each office_branch id (B_kontor1) as many individuals could be assigned to same officer. However, I want to detect whether B_sign1 has duplicates across B_kontor1, which means I want to identify whether any officer worked at multiple branches of the office. Here is the data example.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str8 persid int(B_sign1 B_kontor1)
    "02217983" 22628  610
    "01484126" 19912  610
    "00363649"  5000  610
    "00381005"  6458  610
    "04688359"     1  610
    "04086732"  4170  610
    "02336145" 22628  610
    "04050467"  9843  610
    "00846512"     1  610
    "04555715"     1  610
    "02839099" 19659  610
    "02507193"  5000  610
    "03197469"     1  610
    "00720620" 17028  610
    "02601685" 21258  610
    "02267734" 20878  610
    "04214318"  5000  610
    "02016939" 15944  610
    "04961441" 20459  610
    "00874480"  4170  610
    "02438671" 20459  610
    "02436507"  6458  610
    "01892795"  6470  610
    "02777097" 20459  610
    "03952906" 20459  610
    "05016489" 20878  610
    "01680877" 20459  610
    "00098926" 22628  610
    "05017958"     1  610
    "03872658" 22628  610
    "03979683" 12913  610
    "00056433"  2138  610
    "00551636"  5000  610
    "04818213"  6458  610
    "01130114"  9188  610
    "01242009" 20459  610
    "00731697" 20878  610
    "00074030" 22628  610
    "00363360" 22628  610
    "01827538"     1  610
    "00308936"  6470  610
    "01663033"  5000  610
    "03789569"  6458  610
    "04070787"     1  610
    "00294486" 20878  610
    "00126843"  5000  610
    "03217922" 22628  610
    "03417322"  6470  610
    "00273920" 20459  610
    "00032504"  7072  610
    "04923474"  4170  610
    "04972122" 20459  610
    "03114186"  5000  610
    "03795695" 22628  610
    "01789538"  7072  610
    "03469842" 20459  610
    "02892250"  8628 1910
    "02344817"  9244 1910
    "02338538"  6384 1910
    "01655745" 11551 1910
    "00761203"  1166 1910
    "04401685"  2078 1910
    "04536305"  8628 1910
    "01121400"     6 1910
    "01843313"  8628 1910
    "04533696"     6 1910
    "01471623" 20668 1910
    "00838069"  6384 1910
    "03506156" 23902 1910
    "04571614"  2078 1910
    "03930319" 12735 1910
    "02002964"  8628 1910
    "04552109" 11551 1910
    "04147211"  2008 1910
    "01819275"  6384 1910
    "01659850"  2078 1910
    "00515319" 20668 1910
    "04718246"  2078 1910
    "02155997"     6 1910
    "04691050"     6 1910
    "01748206"  8628 1910
    "04335775" 11551 1910
    "01163778" 11551 1910
    "03376821"  5856 1910
    "03543581"     6 1910
    "00658928" 15289 1910
    "03554215"     6 1910
    "01213669"     6 1910
    "00665992"     6 1910
    "04783558"  4552 1910
    "01084246"  6384 1910
    "00813381"     6 1910
    "05078102"  5430 1910
    "00938946"  8628 1910
    "03045535"  8628 1910
    "02946079"     6 1910
    "03713820" 15289 1910
    "01545040"  5856 1910
    "02635304" 15289 1910
    "00831328"  6384 1910
    end

    Thanks,
    Mohammad Zariab Hossain

  • #2
    These may help:

    https://www.stata.com/support/faqs/d...ions-in-group/

    Code:
    help duplicates
    Code:
    search distinct, sj

    Comment


    • #3
      I read the question somewhat differently - if there are a small enough number of offices and officers, then a cross-tab:
      Code:
      ta B_sign1 B_kontor1
      will give you the answer directly; if there are too may, then the community-contributed -groups- may do it for you; use -search- to find and download

      Comment

      Working...
      X