Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging in multiple tabs within one excel spreadsheet to create one dataset

    Hi,

    I'm seeking to create one large dataset from one excel spreadsheet with about 46 tabs. Oppose to importing them individually and creating a bunch of dta files to append, is there a smoother way to do this all at once.

  • #2
    Well, there is no way that doesn't involve creating at least some intermediate files along the way. Any approach to this will at least require creating intermediate temporary files for the current tab being imported and the cumulative built-up data set so far. Those can be tempfiles, of course, if you prefer that.

    So you can do something along these lines:

    Code:
    //    BUILD A LIST OF THE TABS
    import excel using my_excel_file.xlsx, describe
    local n_sheets `r(N_worksheet)
    forvalues i = 1/`n_sheets' {
        local sheet`i' `r(worksheet_`i')'
    }
    
    //    NOW IMPORT THEM ONE BY ONE
    clear
    tempfile building
    save `building', emptyok
    
    forvalues i = 1/`n_sheets' {
        import excel using my_excel_file.xlsx, clear sheet(`sheet`i'') // PERHAPS OTHER OPTIONS AS NEEDED
        gen source_tab = `"`sheet`i''"'
        append using `building'
        save `"`building'"', replace
    }
    
    use `building', clear
    save my_combined_data_file, replace
    Note: Not tested. I have done kind of thing often, and the logic is correct. But this may have typos, unbalanced quotes or braces, etc.

    That said, you may come to regret doing this. Even when data like this come from well-trusted sources, there are frequently name clashes, or data storage type or labeling clashes from one worksheet to the next. By building the combined data set in one fell swoop, you create a combined data set that is difficult to clean and enforce consistency on. In my view, it is easier to import each tab separately, clean them separately, and enforce consistency of names, storage types, value labels, etc., at the individual data set level. Then, after each of them is ready for use, append them all in a quick simple loop.

    You pays your money. You makes your choice.

    Comment


    • #3
      See xls2dta (SSC).

      Best
      Daniel

      Comment

      Working...
      X