Hello Statalist,
Using Stata 17, I need to take a string variable (chid) and clean it up so I can merge it with another dataset. I see two problems in the data below.
1. chid is sometimes 7 characters and other times 8, but in the other dataset my equivalent variable adds a suffix 0 in front of observations which are only 7 characters. In other words, row 1 (William Juckes) should be 09216678. But row 4 (Oliver Adkins) should not be altered.
2. chid sometimes lists two sets of codes - ie in the last row of the toy data, 11296834, 09235524. In this case I want to generate a duplicate row, except each row should only contain a single code in chid. The maximum number of chid codes is 2.
I'm not sure how to accomplish these tasks. Any advice very much welcome!
Thanks.
Using Stata 17, I need to take a string variable (chid) and clean it up so I can merge it with another dataset. I see two problems in the data below.
Code:
input str52 name str128 chid "William Juckes" "9216678" "Nicholas Charalambous" "8490515" "Kirsty Senior" "8452449" "Oliver Adkins" "10966959" "Alec Ramsey" "10966959" "Ryan Burton" "SC565616" "Flemming Andersen" "10670173" "Tim Barclay" "9417237" "John Matthew Martin" "11220400" "Fotios Talantzis" "10313940" "Iraklis Bourantas" "10313940" "Amanda Harrington" "11249637" "Amaury De Closset" "10653083" "Daniel Morton" "10653083" "Joy Foster" "10418960" "Natasha Elizabeth Helena Thomas" "5729837" "Steven Tyson" "9241863" "Chao Liu" "9250295" "Alexey Chudnovsky" "5269210" "Andrew Orrock" "11296834, 09235524"
2. chid sometimes lists two sets of codes - ie in the last row of the toy data, 11296834, 09235524. In this case I want to generate a duplicate row, except each row should only contain a single code in chid. The maximum number of chid codes is 2.
I'm not sure how to accomplish these tasks. Any advice very much welcome!
Thanks.
Comment