Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to create a loop to parse blood results from multiple time points, manual code works but long and laborious.

    Dear Statalisters All,

    Thanks for looking into this post. I am surprised that I have not found a solution to this problem, given how common data collection systems report variables in this way.

    I hope my explanation is as concise as possible while giving all relevant information.

    The data is from a TB surveillance system in an African Country. Blood tests are done at multiple time points but blood tests are at the discretion of the clinician and therefore are not standard. Furthermore, they are all grouped into one cell called bloodresults1, bloodresults2. Although they show some sort of order, the Sodium (Na) blood results may appear 1st, 2nd or 3rd in the cell ceparated by a ";". This is repeated for as many blood tests each patient has ie. bloodresult1 bloodresult2 bloodresult3 and so on.

    The idea is to first separate all the blood results within the bloodresult1 variables so they are separate variables, ie. bloodresult1pars1 bloodresult1pars2 bloodresult1pars3. Now, Sodium may appear in the 1st, second or 3rd variable (bloodresult1pars1-3) depending on what other blood results came before it. But essentially, with the code listed below, I can get stata to comb through these bloodresult1pars* variables and extract the numeric variable of Na if Na appears in the cell (or other bllod resutls such as wbc or viralLoad etc. this means that for bloodresults1pars*

    I can get any Na blood results, but that is already many lines of code that need to be individually recoded. It needs to then be repeated for any blood result you want to pull form the visit, ie Soduim, Pottassium, creatinine, Heamaglobin, Viralload etc etc. And that then may been repeated for however many vists each patient has.

    The variable from dataex is as follows, please not that this just shows for one visits and would be repeated for however many monthly visits a patient has over the course of a 6-24month treatment.
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str245 bloodresults1
    "Na=137;K=5.8;Urea=6.5;Creat=77;Egfr=>60;Wbc=4.54;Hgb=18.8;Plt=284;Tbili=21;Alt=15;Alp=83;Ggt=49;Ca=2.48;Mg=0.78;Tsh=1.13;Cd4=83;CragReflex=NEG"   
    "K=5.9;Creat=88;Egfr=>60;Alt=15"                                                                                                                   
    "Na=123;Urea=17.6;Creat=91;Egfr=59;Wbc=1.37;Hgb=10.6;Plt=10.6;Tbili=92;Alt=37;Alp=167;Ggt=32;Ca=1.98;Mg=0.93;Po4=1.34;Tsh=3.20;Cd4=29;ViralLoad=58"
    "Na=138;K=5.0;Urea=4.2;Creat=69;Egfr=>60;Wbc=3.53;Hgb=12;Plt=274;Tbili=5;Alt=12;Ca=2.25;Mg=0.93;Po4=1.03;Tsh=1.72;HEPATITIS B =NEGATIVE"           
    "Creat=119;Egfr=52;Wbc=10.03;Hgb=11.2;Plt=471"                                                                                                     
    end
    The manual code I have made works as I want it to, but I am graciously asking if any statalisters can look over it and suggest "shortcuts" or more elegant coding or loops. I have tried a loop but the var* does not seem to be working.

    Code:
    /////////BloodResults1///////
    capture drop bloodresult1spars*
    split bloodresults1, parse(;) generate(bloodresult1spars)
    
    
    /// Na, K, Creat, ViralLoad, Wbc, Hgb, Plt, 
    
    ///NA
    capture drop na1
    generate  na1 = ustrregexs(0) if ustrregexm(bloodresult1spars1,"[0-9,]+") & strpos(bloodresult1spars1, "Na") >0
    
    replace na1 = ustrregexs(0) if ustrregexm(bloodresult1spars2,"[0-9,]+") & strpos(bloodresult1spars2, "Na") >0
    
    replace na1 = ustrregexs(0) if ustrregexm(bloodresult1spars3,"[0-9,]+") & strpos(bloodresult1spars3, "Na") >0
    
    replace na1 = ustrregexs(0) if ustrregexm(bloodresult1spars4,"[0-9,]+") & strpos(bloodresult1spars4, "Na") >0
    
    replace na1 = ustrregexs(0) if ustrregexm(bloodresult1spars5,"[0-9,]+") & strpos(bloodresult1spars5, "Na") >0
    
    replace na1 = ustrregexs(0) if ustrregexm(bloodresult1spars6,"[0-9,]+") & strpos(bloodresult1spars6, "Na") >0
    
    replace na1 = ustrregexs(0) if ustrregexm(bloodresult1spars7,"[0-9,]+") & strpos(bloodresult1spars7, "Na") >0
    
    replace na1 = ustrregexs(0) if ustrregexm(bloodresult1spars8,"[0-9,]+") & strpos(bloodresult1spars8, "Na") >0
    
    replace na1 = ustrregexs(0) if ustrregexm(bloodresult1spars9,"[0-9,]+") & strpos(bloodresult1spars9, "Na") >0
    
    replace na1 = ustrregexs(0) if ustrregexm(bloodresult1spars10,"[0-9,]+") & strpos(bloodresult1spars10, "Na") >0
    
    replace na1 = ustrregexs(0) if ustrregexm(bloodresult1spars11,"[0-9,]+") & strpos(bloodresult1spars11, "Na") >0
    
    replace na1 = ustrregexs(0) if ustrregexm(bloodresult1spars12,"[0-9,]+") & strpos(bloodresult1spars12, "Na") >0
    
    replace na1 = ustrregexs(0) if ustrregexm(bloodresult1spars13,"[0-9,]+") & strpos(bloodresult1spars13, "Na") >0
    
    replace na1 = ustrregexs(0) if ustrregexm(bloodresult1spars14,"[0-9,]+") & strpos(bloodresult1spars14, "Na") >0
    
    replace na1 = ustrregexs(0) if ustrregexm(bloodresult1spars15,"[0-9,]+") & strpos(bloodresult1spars15, "Na") >0
    
    replace na1 = ustrregexs(0) if ustrregexm(bloodresult1spars16,"[0-9,]+") & strpos(bloodresult1spars16, "Na") >0
    
    replace na1 = ustrregexs(0) if ustrregexm(bloodresult1spars17,"[0-9,]+") & strpos(bloodresult1spars17, "Na") >0
    
    replace na1 = ustrregexs(0) if ustrregexm(bloodresult1spars18,"[0-9,]+") & strpos(bloodresult1spars18, "Na") >0
    
    replace na1 = ustrregexs(0) if ustrregexm(bloodresult1spars19,"[0-9,]+") & strpos(bloodresult1spars19, "Na") >0
    
    replace na1 = ustrregexs(0) if ustrregexm(bloodresult1spars20,"[0-9,]+") & strpos(bloodresult1spars20, "Na") >0
    
    replace na1 = ustrregexs(0) if ustrregexm(bloodresult1spars21,"[0-9,]+") & strpos(bloodresult1spars21, "Na") >0
    
    replace na1 = ustrregexs(0) if ustrregexm(bloodresult1spars22,"[0-9,]+") & strpos(bloodresult1spars22, "Na") >0
    
    replace na1 = ustrregexs(0) if ustrregexm(bloodresult1spars23,"[0-9,]+") & strpos(bloodresult1spars23, "Na") >0
    
    //// Wbc
    capture drop wbc1
    generate  wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars1,"[0-9,]+") & strpos(bloodresult1spars1, "Wbc") >0
    
    replace wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars2,"[0-9,]+") & strpos(bloodresult1spars2, "Wbc") >0
    
    replace wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars3,"[0-9,]+") & strpos(bloodresult1spars3, "Wbc") >0
    
    replace wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars4,"[0-9,]+") & strpos(bloodresult1spars4, "Wbc") >0
    
    replace wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars5,"[0-9,]+") & strpos(bloodresult1spars5, "Wbc") >0
    
    replace wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars6,"[0-9,]+") & strpos(bloodresult1spars6, "Wbc") >0
    
    replace wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars7,"[0-9,]+") & strpos(bloodresult1spars7, "Wbc") >0
    
    replace wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars8,"[0-9,]+") & strpos(bloodresult1spars8, "Wbc") >0
    
    replace wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars9,"[0-9,]+") & strpos(bloodresult1spars9, "Wbc") >0
    
    replace wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars10,"[0-9,]+") & strpos(bloodresult1spars10, "Wbc") >0
    
    replace wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars11,"[0-9,]+") & strpos(bloodresult1spars11, "Wbc") >0
    
    replace wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars12,"[0-9,]+") & strpos(bloodresult1spars12, "Wbc") >0
    
    replace wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars13,"[0-9,]+") & strpos(bloodresult1spars13, "Wbc") >0
    
    replace wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars14,"[0-9,]+") & strpos(bloodresult1spars14, "Wbc") >0
    
    replace wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars15,"[0-9,]+") & strpos(bloodresult1spars15, "Wbc") >0
    
    replace wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars16,"[0-9,]+") & strpos(bloodresult1spars16, "Wbc") >0
    
    replace wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars17,"[0-9,]+") & strpos(bloodresult1spars17, "Wbc") >0
    
    replace wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars18,"[0-9,]+") & strpos(bloodresult1spars18, "Wbc") >0
    
    replace wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars19,"[0-9,]+") & strpos(bloodresult1spars19, "Wbc") >0
    
    replace wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars20,"[0-9,]+") & strpos(bloodresult1spars20, "Wbc") >0
    
    replace wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars21,"[0-9,]+") & strpos(bloodresult1spars21, "Wbc") >0
    
    replace wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars22,"[0-9,]+") & strpos(bloodresult1spars22, "Wbc") >0
    
    replace wbc1 = ustrregexs(0) if ustrregexm(bloodresult1spars23,"[0-9,]+") & strpos(bloodresult1spars23, "Wbc") >0
    
    destring wbc1, replace
    The list then gives
    Code:
     list na1 in 1/5
     list na1 wbc1 in 1/5
    
         +------------+
         | na1   wbc1 |
         |------------|
      1. | 137      4 |
      2. |          . |
      3. | 123      1 |
      4. | 138      3 |
      5. |         10 |
         +------------+
    Which corresponds to the na and wbc values well from the initial bloodresults1 variables already . As stated that large and very repetitive coding needs to be done for each blood results variable you might want and again for each blood result (ie. visit) you might want.

    If statalisters can find a way to shorten this code or loop it that would be amazing and I am sure many would find it useful.

    Kind regards
    Brian Brummer


  • #2
    Would this do?

    (I created a simple fake idvar; presumably you have something else to identify people in your data.)

    Code:
    split bloodresults1, parse(;) generate(bloodresult1spars)
    
    gen idvar = _n
    reshape long bloodresult1spars, i(idvar) j(num)
    drop num bloodresults1
    drop if missing(bloodresult1spars)
    
    split bloodresult1spars , parse("=" "=>")
    drop bloodresult1spars
    rename bloodresult1spars1 test
    rename bloodresult1spars2 val_
    replace test = trim(test)
    replace test = "HepB" if test == "HEPATITIS B"
    replace test = lower(test)
    
    reshape wide val_, i(idvar) j(test) string
    destring val_*, replace
    rename val_* *
    This produces the following for sodium and WBCs:
    Code:
    . li na wbc
    
         +-------------+
         |  na     wbc |
         |-------------|
      1. | 137    4.54 |
      2. |   .       . |
      3. | 123    1.37 |
      4. | 138    3.53 |
      5. |   .   10.03 |
         +-------------+
    Last edited by Hemanshu Kumar; 30 Aug 2022, 05:18.

    Comment


    • #3

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str245 bloodresults1
      "Na=137;K=5.8;Urea=6.5;Creat=77;Egfr=>60;Wbc=4.54;Hgb=18.8;Plt=284;Tbili=21;Alt=15;Alp=83;Ggt=49;Ca=2.48;Mg=0.78;Tsh=1.13;Cd4=83;CragReflex=NEG"   
      "K=5.9;Creat=88;Egfr=>60;Alt=15"                                                                                                                   
      "Na=123;Urea=17.6;Creat=91;Egfr=59;Wbc=1.37;Hgb=10.6;Plt=10.6;Tbili=92;Alt=37;Alp=167;Ggt=32;Ca=1.98;Mg=0.93;Po4=1.34;Tsh=3.20;Cd4=29;ViralLoad=58"
      "Na=138;K=5.0;Urea=4.2;Creat=69;Egfr=>60;Wbc=3.53;Hgb=12;Plt=274;Tbili=5;Alt=12;Ca=2.25;Mg=0.93;Po4=1.03;Tsh=1.72;HEPATITIS B =NEGATIVE"           
      "Creat=119;Egfr=52;Wbc=10.03;Hgb=11.2;Plt=471"                                                                                                     
      end
      
      gen wanted1= real(ustrregexra(" " +lower(bloodresults1)+ " ", "[; ].*na=(\d+|\d+\.\d+)[; ].*", "$1"))
      gen wanted2= real(ustrregexra(" " +lower(bloodresults1)+ " ", "[; ].*wbc=(\d+|\d+\.\d+)[; ].*", "$1"))
      Res.:

      Code:
      . l wanted*
      
           +-------------------+
           | wanted1   wanted2 |
           |-------------------|
        1. |     137      4.54 |
        2. |       .         . |
        3. |     123      1.37 |
        4. |     138      3.53 |
        5. |       .     10.03 |
           +-------------------+

      Comment


      • #4
        Hi Andrew,

        That is beautiful, thank you.

        So for each Na value from each bloodresult I would rewrite:
        Code:
         
         gen na1= real(ustrregexra(" " +lower(bloodresults1)+ " ", "[; ].*na=(\d+|\d+\.\d+)[; ].*", "$1"))  gen na2= real(ustrregexra(" " +lower(bloodresults2)+ " ", "[; ].*na=(\d+|\d+\.\d+)[; ].*", "$1"))  
         gen na3= real(ustrregexra(" " +lower(bloodresults3)+ " ", "[; ].*na=(\d+|\d+\.\d+)[; ].*", "$1"))
        I don't mind rewriting that for each, it is alot less than what I was going to do

        However, for the completeness of this thread....What would you suggest to get na1 - na20 for bloodresults1- bloodresult20? Would you loop or is rewriting, as above the best?

        Comment


        • #5
          Code:
          forval i=1/20{
              gen na`i'= real(ustrregexra(" " +lower(bloodresults`i')+ " ", "[; ].*na=(\d+|\d+\.\d+)[; ].*", "$1"))
          }
          should do it. See

          Code:
          help forval

          Comment


          • #6
            Thank you!

            I think that concludes the matter.

            Comment

            Working...
            X