Good morning!
I have a dataset of cancer patients with 100,000 observations. The Master dataset has an ICD-O3 code for the type of cancer. The Using dataset (from SEER) has the Site codes and the Site Description. I want to merge the datasets so that the Site Description appears in the Master dataset. The challenge is that the Site Codes in the SEER data take the following form for several observations: “C000-C006,C008-C009”. I did split and reshape to end up with “C000-C006” and “C008-C009”, but can't figure out the next step.
I have looked at using joinby, cross, inrange but can’t find a solution yet. I'm using Stata/SE16.1 Examples of Master, Using and Results are below. Any suggestions will be appreciated.
MASTER dataset
USING dataset
RESULT
I have a dataset of cancer patients with 100,000 observations. The Master dataset has an ICD-O3 code for the type of cancer. The Using dataset (from SEER) has the Site codes and the Site Description. I want to merge the datasets so that the Site Description appears in the Master dataset. The challenge is that the Site Codes in the SEER data take the following form for several observations: “C000-C006,C008-C009”. I did split and reshape to end up with “C000-C006” and “C008-C009”, but can't figure out the next step.
I have looked at using joinby, cross, inrange but can’t find a solution yet. I'm using Stata/SE16.1 Examples of Master, Using and Results are below. Any suggestions will be appreciated.
MASTER dataset
Code:
* Example generated by -dataex-. For more info, type help dataex clear input byte PatientID str4 Cancer 1 "C001" 2 "C099" 3 "C001" 4 "C154" 5 "C220" 6 "C171" 7 "C059" 8 "C081" 9 "C130" 10 "C001" end
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str49 Siterecode str34 SiteDescription "C000-C006,C008-C009" "LIP" "C019" "BASE OF TONGUE" "C020-C024,C028-C029" "TONGUE (EXCL BASE OF TONGUE)" "C030-C031,C039-C041,C048-C052,C058-C062,C068-C069" "GUM, FLOOR OF MOUTH, & OTHER MOUTH" "C079-C081,C088-C089" "SALIVARY GLAND" "C090-C091,C098-C104,C108-C109" "OROPHARNYX" "C110,C112-C113,C118-C119" "NASOPHARYNX (EXCL POSTERIOR WALL)" "C111" "POSTERIOR WALL OF NASOPHARYNX" "C129-C132,C138-C139" "HYPOPHARYNX" "C140,C142,C148" "PHARYNX" "C150-C155,C158-C159" "ESOPHAGUS" "C160-C166,C168-C169" "STOMACH" "C170-C173,C178-C179" "SMALL INTESTINE" "C180, C182-C189, C199" "LARGE INTESTINE, (EXCL. APPENDIX)" "C181" "APPENDIX" "C209" "RECTUM" "C210-C212,C218" "ANAL CANAL & ANUS" "C220" "LIVER" "C221" "INTRAHEPATIC BILE DUCTS" end
RESULT
Code:
* Example generated by -dataex-. For more info, type help dataex clear input byte PatientID str4 Site str34 SiteDescription 1 "C001" "LIP" 2 "C099" "OROPHARNYX" 3 "C001" "LIP" 4 "C154" "ESOPHAGUS" 5 "C220" "LIVER" 6 "C171" "SMALL INTESTINE" 7 "C059" "GUM, FLOOR OF MOUTH, & OTHER MOUTH" 8 "C081" "SALIVARY GLAND" 9 "C130" "HYPOPHARYNX" 10 "C001" "LIP" end
Comment