Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Importing Several Excel Ranges from different Excel files onto Stata (for Appending)

    Hello Stata gurus,

    This is my first post on Statlist and I am currently attempting to import excel ranges from different excel files onto Stata (for appending). Anyone have any suggestions on the best way to go about this? Decently new to using Stata. There are about 40 excel files in total, each structured uniquely so there will be differing ranges I would want to import (if that makes sense?)

    Best,
    Brady

  • #2
    First, before you write any code, make sure you review -help import excel- and the associated sections of the PDF manual that are linked there. Familiarize yourself with all of the options you may need. In particular -cellrange()- sounds like it will be critical for you, and -firstrow- is usually important in general. Once you have solidified your understanding of that command, it sounds like you will have to write 40 -import excel- commands: one for each file since the cellranges are all different.

    Alternatively, you can write a loop that first uses -import excel, describe- and then accesses the cellrange that Stata recommends and leaves behind in `r(range_1)' and rely on that. My experience is that these cellranges often include lots of useless blank rows at the end that become empty observations that later need to be -drop-ped from the Stata file. Also you can't rely on this at all if part of what you are trying to exclude by using the cellrange is something that can only be recognized by a person, such as rows that contain summary statistics rather than data, or surplus titling in the first one or more rows, etc. In that case you will have to use your eyes to identify the right cellrange and you will have to write a separate importation command for each spreadsheet.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      First, before you write any code, make sure you review -help import excel- and the associated sections of the PDF manual that are linked there. Familiarize yourself with all of the options you may need. In particular -cellrange()- sounds like it will be critical for you, and -firstrow- is usually important in general. Once you have solidified your understanding of that command, it sounds like you will have to write 40 -import excel- commands: one for each file since the cellranges are all different.

      Alternatively, you can write a loop that first uses -import excel, describe- and then accesses the cellrange that Stata recommends and leaves behind in `r(range_1)' and rely on that. My experience is that these cellranges often include lots of useless blank rows at the end that become empty observations that later need to be -drop-ped from the Stata file. Also you can't rely on this at all if part of what you are trying to exclude by using the cellrange is something that can only be recognized by a person, such as rows that contain summary statistics rather than data, or surplus titling in the first one or more rows, etc. In that case you will have to use your eyes to identify the right cellrange and you will have to write a separate importation command for each spreadsheet.
      Thank you for your prompt response Clyde. I think I am more inclined to write a loop. Is there a guide I can reference with the codes to properly create the loop on Stata?

      Comment


      • #4
        As you are "decently new" to Stata, have you read -help foreach- and -help forvalues- yet? Those are the principal loop commands in Stata.

        The code to import and do stuff with a bunch of Excel file, where each has a different range, looks something like this:

        Code:
        local filenames: dir "." files "*.xlsx"
        
        foreach f of local filenames {
            import excel using `"`f'"', describe
            local range `r(range_1)'
            import excel using `"`f'"', cellrange(`range') firstrow case(lower) clear
            // whatever
        }
        The first -local filenames- command is just to build a list of the Excel files. In this case, it makes a list consisting of all the .xlsx files in the current directory. If you have some other set of Excel file in mind, you need to construct the list of filenames accordingly. The loop then iterates over the filenames list, first getting the cellrange for the file, and then reading in that cellrange into memory. The -firstrow case(lower)- options tell Stata to treat the first row of the cell range as variable names, not data. (Your spreadsheets may not be structured that way, and if they aren't then don't use these options. Most spreadsheets, however, do use the first row for variable names.) Note also that this code assumes that each spreadsheet contains only a single worksheet ("tab"). If there is more than one worksheet in an Excel file, this code only reads in the first one //whatever gets replaced by whatever it is you want to do to each spreadsheet once you have read it in. You might just want to save them as Stata data sets with a -save- command. Or maybe you want to do some data management first. Anyway, whatever it is you want to do to all those files, that's the place to do it.

        Comment

        Working...
        X