Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Looping across multiple tabs and groups of values in an Excel file

    Was not sure how to title this but the issue/question I have is this:

    I have data on hospitalizations and ED visits where each row contains 25 variables, each of which is an ICD-10 diagnostic code. The goal of a preliminary step in the analyses is to assign each specific ICD-10 diagnostic code to one of 25 major diagnostic categories (MDCs). Each MDC subsumes many (50 or more) ICD-10 codes. An example of one MDC would be 'Alcohol Drug Use and Induced Mental Disorders. An ICD-10 diagnosis of alcohol use disorder or cannabis use disorder would both be flagged as being in this MDC. The MDC codes are now stored in an Excel file with each of the 25 tabs containing the ICD-10 codes that are in each MDC. I have attached the Excel file with the MDC codes to show its structure.

    I want to write a program that loops through each of the 25 ICD-10 codes in each record, compares that code to the ICD-10 codes within each MDC and when a match is found, create a flag variable to indicate which MDC group that diagnosis is in. So one of a person's ICD-10 diagnoses was for an alcohol use disorder, a variable for MDC20 would be set to 1 for that record and the program would move on to the next ICD-10 code.

    What is puzzling to me is how to incorporate the MDC codes, which are in the Excel workbook tabs into the looping structure.

    If anyone has done something like this or has suggestions how to create the code, I would be most obliged.

    Thanks.
    Attached Files

  • #2
    At first glance, I'd create a single file of the stacked mcd files. Then you can joinby on icd_code. There are some duplicates, so you'll get multiple links in some cases.

    Code:
    foreach set in 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 {
    import excel using icd10_mdc_codes.xlsx, sheet(mdc`set') firstrow clear 
    ren MDC* icd_code
    ren B disease
    g set = "MCD`set'"
    keep icd_code disease set
    save mdc_`set', replace
    }
    
    clear 
    set obs 0
    foreach set in 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 {
        append using mdc_`set'
    }
    save mcdcodes, replace
    
    use mainfile, clear
    joinby icd_code using mcdcodes
    The "set" variable tells you which MCD file it came from.

    You could delete "disease" variable if it's already in the main file.

    Make sure icd_code is correctly named to match your main file.

    Comment


    • #3
      Hi George,

      Thanks for this solution. I actually arrived at something that I think also works but it takes a lot of time. Might not be anyway around that because the way the code I wrote works is that it checks each of the 25 diagnoses against each of the 25 MDC diagnosis lists. I made use of levelsof to read the diagnoses for each MDC into a local macro. Then I loop over the ICD-10 diagnoses and each local macro list to find matches. I might experiment with your code to see if it is speedier than mine. Again, thanks for the help.

      Code:
      foreach i of numlist 1/25 {
              import excel "/Users/jamesswartz/Desktop/icd10_mdc_codes.xlsx", sheet("mdc`i'") firstrow clear
              rename *, lower
              replace diag`i' = strtrim(diag`i')  //remove trailing and leading blank spaces
              quietly: levelsof diag`i', local(mdc`i') clean //remove double quotes, store in macro
      }
      
      import delimited "DD-21-009_DPH_DD_for_DHS-SUPR-UIC Inpatient 2018-2022.csv", clear
      
      **** Initialize 25 variables, one for each mdc, and set to 0 ****
      foreach i of numlist 1/25 {
          g mdc`i' = 0
      }
      
      
      foreach i of numlist 1/25 {
              foreach diag`i' in `mdc`i'' {
                  foreach var of varlist diag_code_1-diag_code_25 {
                      quietly: replace mdc`i' = 1 if `var' == "`diag`i''"
                  }
              }
      }
      .

      Comment


      • #4
        Hi George,

        Thanks for this solution. I actually arrived at something that I think also works but it takes a lot of time. Might not be anyway around that because the way the code I wrote works is that it checks each of the 25 diagnoses against each of the 25 MDC diagnosis lists. I made use of levelsof to read the diagnoses for each MDC into a local macro. Then I loop over the ICD-10 diagnoses and each local macro list to find matches. I might experiment with your code to see if it is speedier than mine. Again, thanks for the help.

        Code:
        foreach i of numlist 1/25 {
                import excel "/Users/jamesswartz/Desktop/icd10_mdc_codes.xlsx", sheet("mdc`i'") firstrow clear
                rename *, lower
                replace diag`i' = strtrim(diag`i')  //remove trailing and leading blank spaces
                quietly: levelsof diag`i', local(mdc`i') clean //remove double quotes, store in macro
        }
        
        import delimited "DD-21-009_DPH_DD_for_DHS-SUPR-UIC Inpatient 2018-2022.csv", clear
        
        **** Initialize 25 variables, one for each mdc, and set to 0 ****
        foreach i of numlist 1/25 {
            g mdc`i' = 0
        }
        
        
        foreach i of numlist 1/25 {
                foreach diag`i' in `mdc`i'' {
                    foreach var of varlist diag_code_1-diag_code_25 {
                        quietly: replace mdc`i' = 1 if `var' == "`diag`i''"
                    }
                }
        }
        .

        Comment


        • #5
          I suspect a joinby would work faster than a loop. But if it works, all is well.

          Comment

          Working...
          X