Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Flagging IDs with specific Criteria and then using them for a merge

    Hello, Stata members again,

    Kindly I used the below code for lab data for two scenarios as you can see (step1 & step2), the problem is in the last line when I merged them there was no matching even though I used the same codes except for reoccurence coding.

    Code:
    // Step 1: Clean Data
    clear
    cd "/Users/meshalalqhtani/Dropbox/Annual Epi report_Updated/Dengue/new data extraction_2023"
    import excel "lab_2023_DHF.xlsx", sheet("Sheet1") firstrow case(lower)
    
    // Manage duplicates and text
    duplicates report candidatenid
    rename candidatenid id
    duplicates tag id, gen(dup)
    tab dup
    replace testresult = lower(testresult)
    replace teststatus = lower(teststatus)
    replace testname = lower(testname)
    
    // Parse and filter dates
    split resultdate, parse(,) limit(3)
    gen r_date=date(resultdate1,"YMD")
    format %td r_date
    keep if r_date > dmy(01,01,2023) & r_date < dmy(31,12,2023)
    keep if dup == 0
    
    // Identify "dengue"
    split testname, p(,)
    split testresult, p(,)
    rename (testname testresult) orig_=
    reshape long testname testresult, i(id) j(testnum)
    drop if testname == ""
    replace testname = lower(trim(testname))
    replace testresult = lower(trim(testresult))
    gen byte is_dengue = inlist(testname, "dengue igm", "dengue ns1", "dengue pcr")
    gen byte is_positive = inlist(testresult, "positive", "detected")
    gen byte is_dengue_positive = is_dengue * is_positive
    egen wanted2 = max(is_dengue_positive), by(id)
    drop testname testresult is_* testnum
    duplicates drop
    rename orig_* *
    
    // Keep dengue records and save
    keep if wanted2 == 1
    save "dengue_test.dta", replace
    
    // Step 2: Flagging Reoccurring Dengue Cases
    clear
    import excel "lab_2023_DHF.xlsx", sheet("Sheet1") firstrow case(lower)
    
    // Manage text and dates
    replace testresult = lower(testresult)
    replace teststatus = lower(teststatus)
    replace testname = lower(testname)
    split resultdate, parse(,) limit(3)
    gen r_date=date(resultdate1,"YMD")
    format %td r_date
    keep if r_date > dmy(01,01,2023) & r_date < dmy(31,12,2023)
    rename r_date date_stata
    ren candidatenid id
    sort id date_stata
    
    // Calculate date differences
    by id: gen lag_date = date_stata[_n-1]
    gen date_diff = date_stata - lag_date
    gen flag_reoccurrence = (date_diff >= 14 & !missing(date_diff))
    by id: egen id_flag_reoccurrence = max(flag_reoccurrence)
    
    // Keep flagged reoccurrences
    keep if flag_reoccurrence == 1
    bys id: gen int seq = _n
    tab seq
    keep if seq == 1
    
    // Identify "dengue" again
    split testname, p(,)
    split testresult, p(,)
    rename (testname testresult) orig_=
    reshape long testname testresult, i(id) j(testnum)
    drop if testname == ""
    replace testname = lower(trim(testname))
    replace testresult = lower(trim(testresult))
    gen byte is_dengue = inlist(testname, "dengue igm", "dengue ns1", "dengue pcr")
    gen byte is_positive = inlist(testresult, "positive", "detected")
    gen byte is_dengue_positive = is_dengue * is_positive
    egen wanted2 = max(is_dengue_positive), by(id)
    drop testname testresult is_* testnum
    duplicates drop
    rename orig_* *
    
    // Keep dengue records and merge with cleaned data
    keep if wanted2 == 1
    merge m:1 id using "dengue_test.dta"
    
    // Check merge results
    tab _merge
    For your assistance please

  • #2
    Dear Meshal,
    you could make it a lot easier for us to help you if you always post sample data using the dataex command whenever possible. Please take a look at the FAQs, especially point 12.2 https://www.statalist.org/forums/help#stata

    Comment


    • #3
      here is an example
      Code:
      [/input str15 id str198 testname int seq
      "       3144555996"    "al khurma pcr,chikungunya pcr,dengue igg,dengue igm,dengue ns1,dengue pcr,dengue serotype,rift valley fever pcr"                                                                                        1
      "  288052601348" "chikungunya pcr,rapid diagnostic test rdt,dengue igg,dengue igm,dengue ns1,dengue pcr,dengue serotype,al khurma pcr,lassa pcr,rift valley fever pcr,rift valley igg,rift valley igm,zika pcr,ebola pcr" 1
      " HANBEIQ"       "rapid diagnostic test rdt,al khurma pcr,chikungunya pcr,dengue igg,dengue igm,dengue ns1,dengue serotype,ebola pcr,lassa pcr,rift valley fever pcr,rift valley igg,rift valley igm,zika pcr,dengue pcr" 1
      "'-7458"         "dengue igg,dengue igm,dengue ns1,dengue pcr,dengue serotype"                                                                                                                                            1
      "00032778"       "al khurma pcr,dengue igg,dengue ns1,dengue igm,dengue pcr,rift valley fever pcr,chikungunya pcr"                                                                                                        1
      "000893154 "     "dengue igg,dengue igm,dengue ns1,dengue pcr,dengue serotype"                                                                                                                                            1
      "000893154 "     "dengue igg,dengue igm,dengue ns1,dengue pcr,dengue serotype"                                                                                                                                            2
      "001315217"      "dengue igg,dengue igm,dengue ns1,dengue pcr,dengue serotype"                                                                                                                                            1
      "002003"         "al khurma pcr,chikungunya pcr,dengue igg,dengue igm,dengue ns1,dengue pcr,rift valley fever pcr"                                                                                                        1
      "00804467"       "dengue igg,dengue igm,dengue ns1,dengue pcr,dengue serotype"                                                                                                                                            1
      "009954216"      "dengue igm,dengue igg,dengue ns1,dengue pcr,dengue serotype"                                                                                                                                            1
      "009954216"      "dengue igg,dengue igm,dengue ns1,dengue pcr,dengue serotype"                                                                                                                                            2
      "0100610513"     "al khurma pcr,chikungunya pcr,dengue igg,dengue igm,dengue ns1,dengue pcr,rift valley fever pcr"                                                                                                        1
      "011811259"      "dengue igg,dengue igm,dengue ns1,dengue pcr,dengue serotype"                                                                                                                                            1
      "015225105"      "dengue igg,dengue igm,dengue ns1,dengue pcr,dengue serotype"                                                                                                                                            1
      "02424229"       "dengue pcr,dengue igg,dengue igm,dengue ns1"                                                                                                                                                            1
      "05295530"       "dengue igg,dengue igm,dengue ns1,dengue pcr,dengue serotype"                                                                                                                                            1
      "05360519"       "dengue igg,dengue igm,dengue ns1,dengue pcr,dengue serotype"                                                                                                                                            1
      "05385564"       "al khurma pcr,chikungunya pcr,dengue igg,dengue igm,dengue ns1,dengue pcr,rift valley fever pcr"                                                                                                        1
      "054777"         "dengue igg,dengue igm,dengue ns1,dengue pcr,dengue serotype"                                                                                                                                            1
      "05513981"       "rapid diagnostic test rdt,al khurma pcr,chikungunya pcr,dengue igg,dengue igm,dengue ns1,dengue pcr,dengue serotype,ebola pcr,lassa pcr,rift valley fever pcr,rift valley igg,rift valley igm,zika pcr" 1
      "0598674803"     "al khurma pcr,chikungunya pcr,dengue igg,dengue igm,dengue ns1,dengue pcr,rift valley fever pcr"                                                                                                        1
      "06616589"       "dengue igg,dengue igm,dengue ns1,dengue pcr,dengue serotype"                                                                                                                                            1
      "07307722"       "dengue igg,dengue igm,dengue ns1,dengue pcr"                                                                                                                                                            1
      "08004590"       "al khurma pcr,chikungunya pcr,dengue igg,dengue igm,dengue ns1,dengue pcr,rift valley fever pcr"                                                                                                        1
      "0809029"        "dengue igg,dengue igm,dengue ns1,dengue pcr,dengue serotype"                                                                                                                                            1
      "08473953"       "dengue igg,dengue igm,dengue ns1,dengue pcr"                                                                                                                                                            1
      
      end
      I have another issue which is every time I re-ran "bys id: gen int seq=_n" with the same data, there was different removal obseravtions

      Comment


      • #4
        I meant
        Code:
        keep if seq==1
        I re-ran "bys id: gen int seq=_n" with the same data, there was different removal obseravtions

        Comment


        • #5
          I'm really sorry but with this example i cannot run your initial syntax because for example variables resultdate or testresult are missing. If you want to find someone who helps you, then please provide useful sample data. Try to use dataex directly after your import excel command.

          Concerning your second issue:
          As there are duplicate observations with respect to your id variable in your data and you did only specify id for your sorting, the resulting variable seq will always be different and therefore different observations will be deleted if you keep only the first one.
          Code:
          bys id (ANOTHERVARIABLE): gen int seq = n
          Using the above would always additionally sort for variable ANOTHERVARIABLE which can for example be your resultdate or any other variable that is suitable for distinguishing duplicate observations. By this you could make sure to always have the same sorting. Maybe have a look into the helpfile for bysort.

          Comment


          • #6
            Many thanks, Benoo, It helped a lot!

            Comment

            Working...
            X