Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Importing multiple worksheets from multiple excel files into one dta file

    Hi. I have four excel files (OCEmr_Jan to Dec2022_GMRV, OCEmr_Jan to Dec2022_KAR, OCEmr_Jan to Dec2022_KVC, and OCEmr_Jan to Dec2022_MTC) in a folder with other excel files with different names. Each of these files has 12 sheets labelled by the month of the year (january, february, etc.). I want to convert and append each of these worksheets from each excel file to build a single dta file.

    When I run the code within the foreach loop on an individual excel file, it works to extract each worksheet of excel to build a single dta file for each excel file. However, when I run the entire code (the master loop), it fails to do so for all the four excel files together

    The code below is built to create four dta files, one per excel file. However, I also need to append these four dta files together and would like to do so in the loop.

    Any guidance will be much appreciated



    Code:
     
    clear 
    tempfile building 
    save `building', emptyok  
    
    local file "KAR" "GMRV" "MTC" "KVC"
    
    cd "C:\Users\scottr\Dropbox\LVPEI\Data\cleaning\raw data"
    
     foreach f of local file {
    
    import excel using "${db}\raw data\OCEmr_Jan to Dec2022_`file'.xlsx", describe 
    local n_sheets `r(N_worksheet)'
    *return list local n_sheets `r(N_worksheet)' 
    forvalues j = 1/`n_sheets' { 
      local sheet`j' `r(worksheet_`j')' 
    }    
    
    forvalues j = 1/`n_sheets' { 
      import excel using "${db}\raw data\OCEmr_Jan to Dec2022_`file'.xlsx", sheet(`"`sheet`j''"') firstrow case(lower) clear 
      gen sheet = `j' 
      append using `building' 
      save `"`building'"', replace 
    } 
    
    sort sheet
    
    *generating an age and sex var from the combined var
    rename age Age
    split Age, gen(age) parse("/")
    replace age1 = "1" if ustrregexm(age1, "[0-9]{1,2} Mo.")
    destring age1, replace
    rename age1 age
    rename age2 sex
    drop Age
    
    
    *splitting combined date-time var
    gen double appt_dttm = clock(appttime, "DMYhm")
    *assert missing(appt_dttm) == missing(appttime)
    format appt_dttm %tc
    gen appt_date = dofc(appt_dttm)
    format appt_date %td
    gen double appt_time_of_day = appt_dttm - cofd(appt_date)
    format appt_time_of_day %tcHH:MM
    
    *month and year from date
    gen year = year(appt_date)
    gen month = month(appt_date)
    
    
    *replace missing values with .
    rename district District
    rename state State 
    encode District, gen(district)
    encode State, gen(state)
    drop District State appt_dttm
    
    save "${db}\clean data\emr_`file'", replace
    clear
     }
    Last edited by Scott Rick; 06 Jan 2024, 05:51.

  • #2
    I think the start of your loop should be not

    Code:
    foreach f of local file { 
     import excel using "${db}\raw data\OCEmr_Jan to Dec2022_`file'.xlsx", describe
    but
    Code:
     
      foreach f of local file {  
     import excel using "${db}\raw data\OCEmr_Jan to Dec2022_`f'.xlsx", describe
    More generally, it might be better to read in and append all the worksheet pages first, and then clean the data all at once after the loop.
    Devra Golbe
    Professor Emerita, Dept. of Economics
    Hunter College, CUNY

    Comment


    • #3
      Thanks, Devra. That worked well and I cleaned all the data after appending as you suggested.

      Comment

      Working...
      X