Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Import Excel with dynamic cell range

    Hi all,
    this is my first post in Statalist.

    I need help about Excel file importing.

    I need to import data from an Excel file containing multiple sheets. All sheets are automatically produced with an header, so real data start from cell, say, "B7" in each sheet and finish at at certain cell, which differs from sheet to sheet. I want to import only real data without header, so cellrange "B7:AT1200" for Sheet1, "B7:C899" for Sheet2, and so on.


    To the best that I know, there is not any automatic import procedure that meet my need.

    Probably, the solution is to use a macro, but I am not proficient in that.


    I would like to turn into code these logical steps:

    1. use - import excel, describe - to collect all different sheets names and cells range
    2. save cell range of each sheet in a local e.g. local range_1="B1:AT1200"
    3. extract (through a - substr - function, perharps?) the bottom right bound of each sheet's cell range (e. g. AT1200) and save it to another local
    4. loop through each sheet of the original Excel file, defining option - cellrange - of the - import excel - command using the previuosly defined local

    this should sounds similar to:

    Code:
    import excel excelfile.xlsx, describe
    
    local no_sheets=r(N_worksheet)
    
    forvalues i=1/`no_sheets'{
    local range_`i'=r(range_`i')
    local range_`i'_bottomright=trim(substr(range_`i',4,10))  /* I need trim because string length differs from time to time (AT1200 <> G99) */
    }
    
    forvalues i=1/`no_sheets'{
    import excel excelfile.xls, firstrow sheet("Sheet`i'") cellrange("B7:local_`i'_bottomright")  clear
    save sheet_`1'.dta, replace
    }
    of course, it does not work..

    can you help me?
    thanks

    Paolo
    Last edited by Paolo Donatelli; 19 Oct 2016, 03:42.

  • #2
    I will not go into details of why your code does not work. The basic layout is good. This should work

    Code:
    import excel my_excelfile.xlsx , describe
    
    local Nws = r(N_worksheet)
    
    forvalues j = 1/`Nws' {
        local sheet_`j' = r(worksheet_`j')
        local range_`j' r(range_`j')
        tokenize `range_`j'' , parse(":")
        local end_`j' `3'
    }
    
    forvalues j = 1/`Nws' {
        import excel my_excelfile.xlsx ///
            , firstrow sheet(`"`sheet_`j'"') ///
            cellrange(B7:`end_`j'') clear
        save sheet_`j'.dta
    }
    For related problems see xls2dta (SSC).

    Edit:

    Actually, I think import excel might be clever enough to determine what you want when the second argument of cellrange() is not specified. Try

    Code:
    import excel my_excelfile.xlsx , describe
    
    local Nws = r(N_worksheet)
    
    forvalues j = 1/`Nws' {
        local sheet_`j' = r(worksheet_`j')
        import excel my_excelfile.xlsx ///
            , firstrow sheet(`"`sheet_`j'"') ///
            cellrange(B7) clear
        save sheet_`j'.dta
    }
    If this works, then xls2dta will work as a one-liner

    Code:
    xls2dta , allsheets : import excel my_excelfile , firstrow cellrange(B7)
    Best
    Daniel
    Last edited by daniel klein; 19 Oct 2016, 04:25.

    Comment

    Working...
    X