Dear Statalisters All,
Thanks for looking into this post. I am surprised that I have not found a solution to this problem, given how common data collection systems report variables in this way.
I hope my explanation is as concise as possible while giving all relevant information.
The data is from a TB surveillance system in an African Country. Blood tests are done at multiple time points but blood tests are at the discretion of the clinician and therefore are not standard. Furthermore, they are all grouped into one cell called bloodresults1, bloodresults2. Although they show some sort of order, the Sodium (Na) blood results may appear 1st, 2nd or 3rd in the cell ceparated by a ";". This is repeated for as many blood tests each patient has ie. bloodresult1 bloodresult2 bloodresult3 and so on.
The idea is to first separate all the blood results within the bloodresult1 variables so they are separate variables, ie. bloodresult1pars1 bloodresult1pars2 bloodresult1pars3. Now, Sodium may appear in the 1st, second or 3rd variable (bloodresult1pars1-3) depending on what other blood results came before it. But essentially, with the code listed below, I can get stata to comb through these bloodresult1pars* variables and extract the numeric variable of Na if Na appears in the cell (or other bllod resutls such as wbc or viralLoad etc. this means that for bloodresults1pars*
I can get any Na blood results, but that is already many lines of code that need to be individually recoded. It needs to then be repeated for any blood result you want to pull form the visit, ie Soduim, Pottassium, creatinine, Heamaglobin, Viralload etc etc. And that then may been repeated for however many vists each patient has.
The variable from dataex is as follows, please not that this just shows for one visits and would be repeated for however many monthly visits a patient has over the course of a 6-24month treatment.
The manual code I have made works as I want it to, but I am graciously asking if any statalisters can look over it and suggest "shortcuts" or more elegant coding or loops. I have tried a loop but the var* does not seem to be working.
The list then gives
Which corresponds to the na and wbc values well from the initial bloodresults1 variables already . As stated that large and very repetitive coding needs to be done for each blood results variable you might want and again for each blood result (ie. visit) you might want.
If statalisters can find a way to shorten this code or loop it that would be amazing and I am sure many would find it useful.
Kind regards
Brian Brummer
Thanks for looking into this post. I am surprised that I have not found a solution to this problem, given how common data collection systems report variables in this way.
I hope my explanation is as concise as possible while giving all relevant information.
The data is from a TB surveillance system in an African Country. Blood tests are done at multiple time points but blood tests are at the discretion of the clinician and therefore are not standard. Furthermore, they are all grouped into one cell called bloodresults1, bloodresults2. Although they show some sort of order, the Sodium (Na) blood results may appear 1st, 2nd or 3rd in the cell ceparated by a ";". This is repeated for as many blood tests each patient has ie. bloodresult1 bloodresult2 bloodresult3 and so on.
The idea is to first separate all the blood results within the bloodresult1 variables so they are separate variables, ie. bloodresult1pars1 bloodresult1pars2 bloodresult1pars3. Now, Sodium may appear in the 1st, second or 3rd variable (bloodresult1pars1-3) depending on what other blood results came before it. But essentially, with the code listed below, I can get stata to comb through these bloodresult1pars* variables and extract the numeric variable of Na if Na appears in the cell (or other bllod resutls such as wbc or viralLoad etc. this means that for bloodresults1pars*
I can get any Na blood results, but that is already many lines of code that need to be individually recoded. It needs to then be repeated for any blood result you want to pull form the visit, ie Soduim, Pottassium, creatinine, Heamaglobin, Viralload etc etc. And that then may been repeated for however many vists each patient has.
The variable from dataex is as follows, please not that this just shows for one visits and would be repeated for however many monthly visits a patient has over the course of a 6-24month treatment.
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str245 bloodresults1 "Na=137;K=5.8;Urea=6.5;Creat=77;Egfr=>60;Wbc=4.54;Hgb=18.8;Plt=284;Tbili=21;Alt=15;Alp=83;Ggt=49;Ca=2.48;Mg=0.78;Tsh=1.13;Cd4=83;CragReflex=NEG" "K=5.9;Creat=88;Egfr=>60;Alt=15" "Na=123;Urea=17.6;Creat=91;Egfr=59;Wbc=1.37;Hgb=10.6;Plt=10.6;Tbili=92;Alt=37;Alp=167;Ggt=32;Ca=1.98;Mg=0.93;Po4=1.34;Tsh=3.20;Cd4=29;ViralLoad=58" "Na=138;K=5.0;Urea=4.2;Creat=69;Egfr=>60;Wbc=3.53;Hgb=12;Plt=274;Tbili=5;Alt=12;Ca=2.25;Mg=0.93;Po4=1.03;Tsh=1.72;HEPATITIS B =NEGATIVE" "Creat=119;Egfr=52;Wbc=10.03;Hgb=11.2;Plt=471" end
Code:
/////////BloodResults1/////// capture drop bloodresult1spars* split bloodresults1, parse(;) generate(bloodresult1spars) /// Na, K, Creat, ViralLoad, Wbc, Hgb, Plt, ///NA capture drop na1 generate na1 = ustrregexs(0) if ustrregexm(bloodresult1spars1,"[0-9,]+") & strpos(bloodresult1spars1, "Na") >0 replace na1 = ustrregexs(0) if ustrregexm(bloodresult1spars2,"[0-9,]+") & strpos(bloodresult1spars2, "Na") >0 replace na1 = ustrregexs(0) if ustrregexm(bloodresult1spars3,"[0-9,]+") & strpos(bloodresult1spars3, "Na") >0 replace na1 = ustrregexs(0) if ustrregexm(bloodresult1spars4,"[0-9,]+") & strpos(bloodresult1spars4, "Na") >0 replace na1 = ustrregexs(0) if ustrregexm(bloodresult1spars5,"[0-9,]+") & strpos(bloodresult1spars5, "Na") >0 replace na1 = ustrregexs(0) if ustrregexm(bloodresult1spars6,"[0-9,]+") & strpos(bloodresult1spars6, "Na") >0 replace na1 = ustrregexs(0) if ustrregexm(bloodresult1spars7,"[0-9,]+") & strpos(bloodresult1spars7, "Na") >0 replace na1 = ustrregexs(0) if ustrregexm(bloodresult1spars8,"[0-9,]+") & strpos(bloodresult1spars8, "Na") >0 replace na1 = ustrregexs(0) if ustrregexm(bloodresult1spars9,"[0-9,]+") & strpos(bloodresult1spars9, "Na") >0 replace na1 = ustrregexs(0) if ustrregexm(bloodresult1spars10,"[0-9,]+") & strpos(bloodresult1spars10, "Na") >0 replace na1 = ustrregexs(0) if ustrregexm(bloodresult1spars11,"[0-9,]+") & strpos(bloodresult1spars11, "Na") >0 replace na1 = ustrregexs(0) if ustrregexm(bloodresult1spars12,"[0-9,]+") & strpos(bloodresult1spars12, "Na") >0 replace na1 = ustrregexs(0) if ustrregexm(bloodresult1spars13,"[0-9,]+") & strpos(bloodresult1spars13, "Na") >0 replace na1 = ustrregexs(0) if ustrregexm(bloodresult1spars14,"[0-9,]+") & strpos(bloodresult1spars14, "Na") >0 replace na1 = ustrregexs(0) if ustrregexm(bloodresult1spars15,"[0-9,]+") & strpos(bloodresult1spars15, "Na") >0 replace na1 = ustrregexs(0) if ustrregexm(bloodresult1spars16,"[0-9,]+") & strpos(bloodresult1spars16, "Na") >0 replace na1 = ustrregexs(0) if ustrregexm(bloodresult1spars17,"[0-9,]+") & strpos(bloodresult1spars17, "Na") >0 replace na1 = ustrregexs(0) if ustrregexm(bloodresult1spars18,"[0-9,]+") & strpos(bloodresult1spars18, "Na") >0 replace na1 = ustrregexs(0) if ustrregexm(bloodresult1spars19,"[0-9,]+") & strpos(bloodresult1spars19, "Na") >0 replace na1 = ustrregexs(0) if ustrregexm(bloodresult1spars20,"[0-9,]+") & strpos(bloodresult1spars20, "Na") >0 replace na1 = ustrregexs(0) if ustrregexm(bloodresult1spars21,"[0-9,]+") & strpos(bloodresult1spars21, "Na") >0 replace na1 = ustrregexs(0) if ustrregexm(bloodresult1spars22,"[0-9,]+") & strpos(bloodresult1spars22, "Na") >0 replace na1 = ustrregexs(0) if ustrregexm(bloodresult1spars23,"[0-9,]+") & strpos(bloodresult1spars23, "Na") >0 //// Wbc capture drop wbc1 generate wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars1,"[0-9,]+") & strpos(bloodresult1spars1, "Wbc") >0 replace wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars2,"[0-9,]+") & strpos(bloodresult1spars2, "Wbc") >0 replace wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars3,"[0-9,]+") & strpos(bloodresult1spars3, "Wbc") >0 replace wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars4,"[0-9,]+") & strpos(bloodresult1spars4, "Wbc") >0 replace wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars5,"[0-9,]+") & strpos(bloodresult1spars5, "Wbc") >0 replace wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars6,"[0-9,]+") & strpos(bloodresult1spars6, "Wbc") >0 replace wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars7,"[0-9,]+") & strpos(bloodresult1spars7, "Wbc") >0 replace wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars8,"[0-9,]+") & strpos(bloodresult1spars8, "Wbc") >0 replace wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars9,"[0-9,]+") & strpos(bloodresult1spars9, "Wbc") >0 replace wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars10,"[0-9,]+") & strpos(bloodresult1spars10, "Wbc") >0 replace wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars11,"[0-9,]+") & strpos(bloodresult1spars11, "Wbc") >0 replace wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars12,"[0-9,]+") & strpos(bloodresult1spars12, "Wbc") >0 replace wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars13,"[0-9,]+") & strpos(bloodresult1spars13, "Wbc") >0 replace wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars14,"[0-9,]+") & strpos(bloodresult1spars14, "Wbc") >0 replace wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars15,"[0-9,]+") & strpos(bloodresult1spars15, "Wbc") >0 replace wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars16,"[0-9,]+") & strpos(bloodresult1spars16, "Wbc") >0 replace wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars17,"[0-9,]+") & strpos(bloodresult1spars17, "Wbc") >0 replace wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars18,"[0-9,]+") & strpos(bloodresult1spars18, "Wbc") >0 replace wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars19,"[0-9,]+") & strpos(bloodresult1spars19, "Wbc") >0 replace wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars20,"[0-9,]+") & strpos(bloodresult1spars20, "Wbc") >0 replace wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars21,"[0-9,]+") & strpos(bloodresult1spars21, "Wbc") >0 replace wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars22,"[0-9,]+") & strpos(bloodresult1spars22, "Wbc") >0 replace wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars23,"[0-9,]+") & strpos(bloodresult1spars23, "Wbc") >0 destring wbc1, replace
Code:
list na1 in 1/5
list na1 wbc1 in 1/5
+------------+
| na1 wbc1 |
|------------|
1. | 137 4 |
2. | . |
3. | 123 1 |
4. | 138 3 |
5. | 10 |
+------------+
If statalisters can find a way to shorten this code or loop it that would be amazing and I am sure many would find it useful.
Kind regards
Brian Brummer

Comment