Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • replace variable value with numbers to anonymise dataset

    The dataset i am using has a number of l variables that are locality identifiers or organisational identifiers. Some of the locality variables have 7000 or so codes within them and some of the organisational variables have around 5000 codes within them. I want replace these codes with numbers. This is because I need to anonymise the dataset as much as possible, rather than encode string variables. I can write the following.

    gen LSOA_n = ,
    replace LSOA_n = 1 if LSOA == 'OPX'
    replace LSOA_n = 2 if LSOA == 'OPZ'
    ETC...


    However, as LSOA may have over 7000 values this would be very time consuming, and LSOA is just one of the locality variables.

    Is there a way I can replace the code with a number (its doesn't matter which) for each of the locality and organisation variables, so the first instance of a code within a variable, e.g. OPX it gives the value of 1, and then all observations with OPX gets that value, then the next value within that variable e.g. OPZ gets 2, and for all subsequent observations with that value etc.

    I should note that my dataset is very large N >50,000,000

    Example data below:


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str64 ID str6 orgcode str3(orgcode2 locality_code) str9(locality_code2 locality_code3 locality_code4) str3 org_code4
    "0303938D20EF9791B976C94CBA255803858DFF363BCB840F0EBEB267A296ABD7" "F84013" "07T" "704" "E01001822" "E02000349" "E02006921" "07T"
    "1E369A8EA8673BFEF2A391373678D2F58AE0B934879D070485B31BE4079AE730" "D81028" "06H" "623" "E01018234" "E02003776" "E02003776" "06H"
    "77E485CF66E93B9A615A74B85ED9E6943D7FA276410AD7E741C1B3DF9401EA62" "E83018" "07M" "717" "E01000127" "E02000047" "E02000047" "07M"
    "0CDEE9E8A5DE965DEB619B061C1F27511E050F08FF3A333ADD63E5EE76690B8F" "M81015" "06D" "416" "E01032439" "E02006767" "E02006767" "06D"
    "701148B3B123F4AADA76C55F620FFA5A6B5A1E5093277EAFD4736D07AE80DBD6" "D81020" "06H" "624" "E01015649" "E02003256" "E02006877" "06H"
    "25CAB910756FF891CA630967C4659C14CEE10C7369FF84DBA725EBE98DF42EF4" "Y02826" "09J" "820" "E01024279" "E02005055" "E02005055" "09J"
    "20036B14D48CB151A7C5674084B5CAB1A808B74C08F5D95D89EFB4CE4224EB19" "E82094" "06N" "606" "E01023372" "E02004867" "E02004867" "06N"
    "E6B43EE586D862EC1F561D2AC1443F987767B6634A140C2B514936077F78B83A" "M81044" "05F" "415" "E01014065" "E02002926" "E02002926" "05F"
    "C81941D405F821BD701612020B8BBA2424FA3CDF8AC01B54373F37E859D5A8B9" "P81152" "01E" "323" "E01025302" "E02005255" "E02005255" "01E"
    "97993C17CD10238B50A2CB019593F33A57F8C1C8E70BCC6E4DE74E7EFA182295" "K82022" "14Y" "612" "E01017852" "E02003705" "E02003705" "14Y"
    "BB7541EB8E70BC4980337D11F2E5F52ED758B4B1A8F1EC8D7D2CEE2B292CF240" "G81053" "99K" "815" "E01021060" "E02004388" "E02004388" "99K"
    "CF67F4F97D574BC6E3C9C3ABAA2DB303780A7701E56F5EC16C38B6B14568F472" "K81047" "15A" "619" "E01016650" "E02003446" "E02003446" "15A"
    "B73EB60B68976391823A188A46D3DF2AA8AD550FA8773391014C6AAAB6547706" "E81014" "06F" "626" "E01017593" "E02003648" "E02003648" "06F"
    "4569F058E6DA56CE1183696953049E42B9685E9AAE43EC3DA2184A6006754D21" "K84049" "10Q" "608" "E01028522" "E02005947" "E02005947" "10Q"
    "A9F2392C828FD120D041A7CBBDBBDC50848BC7A04B483AA27BF71D0C2FCAE1F0" "C82038" "03W" "508" "E01025907" "E02005395" "E02005395" "03W"
    "982EFB5DA10616E6F80C435B940082921F52FFEFF794CC11D796FE4B9BAE36A1" "L85008" "11X" "905" "E01029053" "E02006047" "E02006047" "11X"
    "E60A0AC1B6E7A57A7A23F650A5CF46BE02F4F4820C40A7DB9C673C617D0B7FF1" "C82071" "03W" "508" "E01026004" "E02005413" "E02006818" "03W"
    "F176919E0A9EFA6C17F6BBA51E5B2CB0E220F33D182651961CE70CAA067F7D6B" "M81038" "05T" "416" "E01032205" "E02006714" "E02006714" "05T"
    "FBF44F56935C015EB6876E8E3D4130002179AA734C5FEA5529E419AA9085D7F6" "F82018" "07L" "716" "E01000070" "E02000012" "E02000012" "07L"
    "E444089BCDE5F2DB31A56704E2E4A537ABD500637176A045EA7B2A252AC58B26" "B85042" "03A" "211" "E01011237" "E02002309" "E02002309" "03A"
    "4F54FBEB6326DB2418E5C33F36D904D055392B55C59F163567F271D42E482336" "C84142" "04E" "511" "E01027963" "E02005829" "E02005829" "04E"
    "7AFB98AFE821C5B2141E1911EEB566FBF2335162B0BA56233136F8933B5C0CF8" "J82207" "10X" "814" "E01017193" "E02003569" "E02003569" "10X"
    "42D2FC0DA37EAC0D55D1519630FFD455F79D02822F7D5CF501EB1911847BF5D9" "Y02794" "15E" "406" "E01009197" "E02001884" "E02001884" "15E"
    "5223630194E3D1CC3311F8F2E056F997CF7ED1428073782043CF4E99DBEA52C9" "H81641" "09Y" "805" "E01030984" "E02006459" "E02006459" "09Y"
    "0A03273F85699EF10E4A9519DB4595F70B4570AF9E95936B1C38D7B411250F02" "Y01066" "08H" "706" "E01002737" "E02000558" "E02000558" "08H"
    "2BCA5EB09C8B9F765ACBD3AAEC734ACB73A6C97A46455DA7CC83FF70ECB5A36F" "E83638" "07M" "717" "E01000296" "E02000046" "E02000046" "07M"
    "8E53DD47C8B710FA9984F7B923B9E97D42995BF372E24794AF2BD10354131F43" "F84670" "08M" "731" "E01003602" "E02000739" "E02000739" "08M"
    "BB84F7B623796842AC5EBE6F0723AA3847526FF8D5F98A50BA2F88DE8D9EBC54" "H81058" "09L" "805" "E01030613" "E02006385" "E02006385" "09L"
    "AE4CD552EAB96F36D39CCB7B83194C4ED6576A364F952A1E6ABA70D41D88F1DC" "F85027" "07X" "723" "E01001538" "E02000285" "E02000285" "07X"
    "D91350423CC4C8A8B25EC1284697C6FFCFC182BEA86D41EE3E1CAB5C8FEE5AB6" "E82083" "08G" "727" "E01002412" "E02000510" "E02000510" "06N"
    "D17A61404D3F074070C4FE1555E6975B08809F3D810EE8636355B04887CA387D" "P85020" "00Y" "307" "E01005388" "E02001115" "E02001115" "00Y"
    "ED46E5C923F5F010D60DCFEF5364818C857D31DD6C78A2A91B66D34F2A6F5F2F" "B83614" "02W" "209" "E01010823" "E02002221" "E02002221" "02W"
    "8BE9788CE35C01B3B6BA7B4BBF267E59564D4156472C920A25D5BC2A8D8D020E" "K82631" "04F" "613" "E01016785" "E02003475" "E02003475" "04F"
    "49B576E30FF36A0B232A464B3B8BAC8A0C9A9078EB1B07DBB1DC3F9EC536235C" "Y02893" "15E" "406" "E01009515" "E02001893" "E02001893" "15E"
    "E4786F7A54B76F7C86C7FD818E3105ACDF77980094C74419702194B77EE319C4" "B83614" "02W" "209" "E01010823" "E02002221" "E02002221" "02W"
    "C79D410D0DF5422402319480A0FBF38F48D1D73FA6814D10081CB297364C3A38" "L83066" "99P" "912" "E01020039" "E02004155" "E02004155" "99P"
    "6CF3E0E18EBD538EAB65F002210692A4484B856FE0226DD720D4C5185BD8ADA2" "G82087" "09C" "820" "E01023982" "E02004996" "E02004996" "09C"
    "13BE697BCE644BE7B78816BEC137E70C1B6319C33100360DF7121568567420B7" "D81630" "06H" "624" "E01015649" "E02003256" "E02006878" "06H"
    "B63251199F79538DC0DB8ED074986FBB3C345BC66D4F8184D74009AAE51DD0F2" "A82044" "01H" "102" "E01019286" "E02004002" "E02004002" "01H"
    "68FB07E7E803B6822990ADBE3283E702F694D227EAF7B1377E32A40121A1A2E0" "F83060" "08H" "706" "E01002786" "E02000563" "E02000563" "08H"
    "D2348DE173FBDAF9B779C192EFFC890E122E49371F6A1A03A41CB4E35DEBA943" "A89032" "00P" "110" "E01008770" "E02001821" "E02001821" "00P"
    "2718747115295A8D08911C5AD4D3653EF153E0FBAED91E91789427B243259AEA" "K82073" "14Y" "612" "E01017628" "E02003672" "E02003672" "14Y"
    "1478617FB4B0B8FFC996CFBCE3E794753B8EC65E7DA2729F973382016BECEF57" "L83065" "99P" "912" "E01020083" "E02004173" "E02004173" "99P"
    "626FAD73508B3498111E01B50D670B7D8401EFD037A3AFAD311A71B213E0767B" "P81017" "01A" "323" "E01025318" "E02005276" "E02005276" "01A"
    "243D50460CF3ED2C5989FF239D380AF211411167B3CD86C49919FFADCB0D5B44" "P88606" "01W" "310" "E01005831" "E02001213" "E02001213" "01W"
    "90763F90D850CADA8F4422F3867ACD3C3A5D6682EC4287EFC5987E490A445211" "P82002" "00T" "304" "E01004834" "E02001001" "E02001001" "00T"
    "6312907E570DD742B290B86327D8E17A87F98EF21A95BB1D73BC94F854253BC4" "L84617" "11M" "904" "E01022348" "E02004639" "E02004639" "11M"
    "CBD6A6CEC161E5CCB282A8C72DDFD0A9FDAA4CE03D061FBA78D1CA8E9E5FEA2D" "F82679" "07L" "716" "E01000015" "E02000010" "E02000010" "07L"
    "84D0458BF9E2FDD6772AD230809752A3B1A895FB4B0031B7ED740DA86394875A" "M85069" "05L" "406" "E01009052" "E02001861" "E02001861" "05L"
    "E2192BD3938BF1DD15DCBEF4887CDA96C16A89311E3FDBE25A420BBD37F23B46" "N81100" "02F" "327" "E01018375" "E02003797" "E02003797" "02F"
    "72A832A9EAE55FC38363A8AE90718DB0CB12A0BBBBED7BF3D2EB4118D862F63B" "Y02506" "10D" "820" "E01024617" "E02005118" "E02005118" "10D"
    "837DB592EE7E42902F2883253C9EDF467E0E2D05C35F914124F12D2B90F65445" "K84613" "10Q" "608" "E01028463" "E02005935" "E02005935" "10Q"
    "9F8F2777A729C4D9BB1A7CF5ADA9F87882AE87E19D3B1609E44FC0F48ED4FE3A" "B86081" "15F" "212" "E01011675" "E02002393" "E02002393" "15F"
    "1E5B7F781E572F20E76A2ABC6F372A52A0E986D5AEF9FC8B734D81E2DE5395B9" "L81124" "15C" "910" "E01014774" "E02003078" "E02003078" "15C"
    "8CD312C9FBAE6C528B8E0D13A719ED60214B08DEEE66EA95F0D42BEB665C545A" "M85176" "05L" "406" "E01009275" "E02001865" "E02001865" "05L"
    "D4DD7071794229FAD9A8B0866DF4B63B727D8FF5C6C89055673EEE1A872EBB93" "N81044" "01R" "326" "E01018463" "E02003826" "E02003826" "01R"
    "3E6F1A62553E888D164C5DE2145CD69C717174080672ED4A7A17E8A3C5E7B742" "M91613" "05Y" "411" "E01010270" "E02002135" "E02002135" "05Y"
    "59DC3F41060912B55F571CF3D24BBFB281202CE53544C0E1EE09C37568F915F4" "A83037" "00D" "116" "E01020846" "E02004344" "E02004344" "00D"
    "4584871BDA540CF1776D83A8616D7117775A992A790F21C9B93032CB44E0BB46" "A87015" "99C" "108" "E01008530" "E02001757" "E02001757" "99C"
    "C84E8839B286BE2FE2936FA24F6631BD40F8975C8C6CE690384B95797898BA9C" "M81091" "05T" "416" "E01032364" "E02006752" "E02006752" "05T"
    "3D1E046015CC790CF0E67A86A97F9BAFB054767FB01676200A936C3D8D99A804" "E81030" "06F" "625" "E01017522" "E02003629" "E02003629" "06F"
    "873DC597366C9E5044460FCE393EBED00A059391544747EFB19E843FE437112F" "B83638" "02W" "209" "E01010823" "E02002221" "E02002221" "02W"
    "41F79D2CB4FD818934617B90E2C280B6E127BB96B55846099B557317EC84809D" "A88003" "00N" "109" "E01008651" "E02001771" "E02001771" "00N"
    "968107D019EED2E22B6C10293A1FF35A41FB2576003F57298E08AE330846E810" "N83020" "01X" "318" "E01006840" "E02001410" "E02001410" "01X"
    "0EFC86C8AFAABD134F1B489927A44E39FD68FE2E152E4358BB743883C8529EF6" "G82105" "10E" "820" "E01024660" "E02005132" "E02005132" "10E"
    "3391DA0A6629D3A149FC5671AE97E9C6E18CAF267D091B32FECB9709190FD919" "E84009" "08E" "725" "E01002231" "E02000445" "E02000445" "08E"
    "7FC8CE2637EA230EDE5C7291E8B785BAE3AAE9078CF77DD464BC7350846596BA" "M82045" "05N" "417" "E01028896" "E02006015" "E02006015" "05N"
    "7142AA5756F988C98C6746FBBA3B6C9307DD07658AD74A9C6E503E3B49618D10" "Y00260" "06K" "606" "E01023636" "E02004911" "E02004911" "06F"
    "DF274582E0747A9A4D9F6A289E7B3A7846FB6CD115BE1DAEE77479F5A505CDFE" "A87013" "99C" "108" "E01008468" "E02001758" "E02001758" "99C"
    "6ACABEB4928E0F26F2A25C6852FA3058BB39AE66AC606E09A7656CE35FC50A97" "F84016" "08V" "711" "E01004266" "E02000870" "E02000870" "08V"
    "A5E8132FEDFD2665D0CB36753050C7CC0097031A5C6B676BC59F48C8F20D421C" "K82026" "04F" "613" "E01016833" "E02003487" "E02003487" "04F"
    "36834398AD2683078150939AA4176C45F4938B60A9205A6E9DA951F19A8349CA" "H81051" "99H" "805" "E01030384" "E02006339" "E02006339" "99H"
    "6CA0E7102C8A8C4D04323E43748CFFF04F3E5D9E6978C57A1CF50D79B95750FB" "B82104" "03D" "218" "E01027755" "E02005785" "E02005785" "03D"
    "987DDB209906B9DF219971ACB140AF16065737D43B6DF6817D48FD13552554D3" "E84015" "07P" "719" "E01000519" "E02000097" "E02000097" "07P"
    "505C53EAB7134CC3929B96BD2AFE533E6F7431C623D6EB641AF2AB45C3CBE6FF" "G84010" "07Q" "720" "E01000694" "E02000133" "E02000133" "07Q"
    "5592CBF0A3C13E48FEDBEF9AC08B0A41CFD2EA9C69D02085C40617935A3CF97B" "G85044" "08K" "708" "E01003149" "E02000647" "E02000647" "08K"
    "CB169CC8745DC31324C418734BE11ED8C0266E3F0CD74610C533DB286E652312" "L81013" "15C" "909" "E01014572" "E02003030" "E02003030" "15C"
    "7486E34EA8B5868085528346B64CCD54B1443999576A2411F06F5854D62694D9" "G82224" "99J" "820" "E01024806" "E02005172" "E02005172" "99J"
    "961B5B961BC854B2F5AF6C23C43AAB31CADD2A4A57F01FA7FF94D524E7C31767" "B86077" "15F" "212" "E01011632" "E02002427" "E02002427" "15F"
    "62DBE2FF0D0F04603DE33D2A6C3BFDC4BC1855730B768BC5958DD59500DBD575" "H83053" "07V" "721" "E01001150" "E02000199" "E02000199" "07V"
    "19C2C1D574B52D9ED886D374F6B86C2FD45CCCD7327037E67701DAFF0A3733CF" "Y02987" "08N" "732" "E01003735" "E02000783" "E02000783" "08N"
    "51A20FD110E25D53D32D9D66EB8B2B52D94118D2FA9252A724CAF7556C614130" "H81088" "99M" "805" "E01030896" "E02006444" "E02006444" "99M"
    "3D2F43F71713EE0D746449B21F92FEE0A645F2F6E09BDB35A8A9CC316DD055F0" "Y02572" "02T" "210" "E01010928" "E02002252" "E02002252" "02T"
    "60F3B6EABFBC4B69CA72340F8A0C9CF88A57B6BB22C928B744D1BE61A46884C5" "F83058" "07R" "702" "E01000916" "E02000192" "E02000192" "07R"
    "D65F5E9C02A1E9104E9412C2D72985E59D8173A47A2D56640837DA7572536C02" "M89003" "15E" "410" "E01010210" "E02002103" "E02002103" "15E"
    "6BABD040ED84A7BE11B6AD78B6C9FFCE702EC63609990AB6CDE3D3DC3BBB20DB" "P81643" "00Q" "324" "E01012577" "E02002617" "E02002617" "00Q"
    "BDAF4FF0B4894FE33A624BC149187F6BE83F53899125B8471051516CA80428E1" "D81630" "06H" "624" "E01015649" "E02003256" "E02006878" "06H"
    "CAAF5124EFEE83F5F9FA5DF53767970C9DB2003D5D14162EABF9983008E8BE04" "M81082" "05J" "416" "E01032159" "E02006705" "E02006705" "05J"
    "CADB90D5DE20AC16C22D3623BA082DAE0258B2C777A413E535A0A3FC9D8EC339" "F81030" "06Q" "620" "E01021421" "E02004462" "E02004462" "06Q"
    "343984C0A102BB7650E75E541E15D4ED32BA2AC47E6C597359A23930A16AA77C" "D81020" "06H" "624" "E01015649" "E02003256" "E02006878" "06H"
    "1E48AF89354ADA9464A0879C58D28E89D6EE8799FA7426355FC462F3C8EFDBC4" "D81633" "06H" "623" "E01018207" "E02003758" "E02003758" "06H"
    "C0796FC9BE8BFE87C70F81FBDBCDF9A72E1E5C552DFEE1D6387F1BB3A554D06A" "B83660" "02W" "209" "E01010844" "E02002221" "E02002221" "02W"
    "AD0D64532B64D8E65EA9649644AF14BA9EB2E3E9C7E775C77A9AE5056F14D526" "J82646" "10V" "812" "E01022915" "E02004770" "E02004770" "10V"
    "478AA7C0C858DB8E79D51A37FA43AED7D19F2BF4090A3E3055C2D4A25D904336" "L84072" "11M" "904" "E01022212" "E02004616" "E02004616" "11M"
    "EBCE73BB18B7100D9256D0CDDA347829B055CB0FCEE8DAE6D8355423188701EC" "F84716" "07T" "704" "E01001812" "E02000347" "E02000347" "07T"
    "EE4089A97C8BBD40CB3CBF59F200CEA859E9580A49F787FB5C28FA584F2836A8" "P91006" "02A" "312" "E01006201" "E02001267" "E02001267" "02A"
    "B9E66A2BA8743A4A32E2891F433037692A02C81B4A991D63C6C88C21CBC98945" "G82087" "09C" "820" "E01024000" "E02004998" "E02004998" "09C"
    "215FC39B3E27EB824F6A8D80F6FC243EBA0216E9DBD7C6501D12E4312AFEB759" "Y04949" "05A" "404" "E01031142" "E02006493" "E02006493" "05A"
    "937E39F5B97018A9734082442C3AF0E9AAF47221C1FB042EFD88A49547630192" "F81097" "99G" "621" "E01015840" "E02003292" "E02003292" "99G"
    "CA7CD7E78971C5CFC69198CFCB05F6C389E7EB470921F66B347E87BCC179A2B5" "N81022" "01C" "326" "E01018577" "E02003865" "E02003865" "01C"
    end

  • #2
    You can do something like:
    Code:
    egen int LSOA_code = group(LSOA)
    to create the codes. But fair warning: the conversion from one to the other is quite deterministic (the order of the groups will be the sort order of LSOA), which may somewhat compromise the anonymisation.

    Comment


    • #3
      If the number really does not matter, use encode and throw away the value label.

      Comment

      Working...
      X