Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extracting data from multiple excel files

    Hi guys,

    Thank you in advance for any advice any of you have. The problem I have encountered is running a loop to extract data from multiple excel files into 1 Stata file. The task I wish to complete is described below:

    I have approximately 800 excel files in one folder in my computer. Each .xls file is a quarterly report for a specific bank in my dataset. The name of each file corresponds to the specific bank, year and quarter as follows: 'bank-name-year-quarter.xls'. From each file I need to extract 2 columns (the variable names in 1, and the values in the other). I also need to get stata to recognise the year and quarter from the file names in order to create a panel data set.

    Ideally the panel data set would have the following structure in long form:

    Bankname Year Quarter {variables 1-n}.

    I am new to writing a loop for this purpose, and have not had much success in writing the appropriate code.

    Hopefully someone has dealt with a similar issue before!

    Regards,

    Sascha

  • #2
    Sascha,

    You'll be more likely to get a helpful response to your questions on Statalist if you formulate your question according to the advice given in the FAQ. Your question is slim on specifics about what you did and how it went wrong.

    Lance

    Comment


    • #3
      Something like this:
      Code:
      *point to correct directory with xls files and convert
      cd "D:\"
      xls2dta, firstrow
      
      *use filename to create new bank and time vars
      local myfilelist : dir . files"*.dta"
      foreach file of local myfilelist{
      use `file', clear
      gen id = "`file'"
      replace id = substr(id,1,length(id)-4)
      split id, p("-")
      ren id1 bank
      ren id2 year
      ren id3 quarter
      drop id
      save, replace
      }
      *Append all your dta files with bank and time vars
      local myfilelist : dir . files"*.dta"
      foreach file of local myfilelist{
      append using `file'
      save appended, replace
      }

      Comment


      • #4
        In the original post it sounds like the Excel files contain two columns, holding the variable names and the corresponding values. Once imported in Stata, this should be

        Code:
        clear
        
        input str7 A B
        "myname1" 42
        "myname2" 73
        "myname3" .
        end
        
        list
        To get such file into shape, here one strategy

        Code:
        levelsof A , local(quoted)
        foreach q of local quoted {
            local names `names' `= strtoname(`"`q'"')'
        }
        drop A
        xpose , clear
        rename (*)(`names')
        
        list
        describe
        Since Jorrit mentioned xls2dta (fomr SSC, I suppose) let me show how you do the hole thing (which will not work, as I have to make too many assumption about the details here).

        Step 1: Enhance the above code

        Code:
        levelsof filename , local(filename)
        local filename : subinstr local filename ".xls" " " , all
        local filename : subinstr local filename "-" " " , all
        gettoken bank filename : filename
        gettoken year filename : filename
        gettoken quarter filename : filename
        drop filename
        
        insert_above_code_here
        
        generate bank = "`bank'"
        generate year = `year'
        generate quarter = `quarter'
        and save the hole thing as do_for_all_files.do.

        Step 2: Convert all Excel files to Stata datasets, creating variable filename, holding the filename

        Code:
        cd wherever_my_excel_files_are
        xls2dta , generate(filename) : import excel .
        (Note that Jerrit is using an old syntax of xls2dta, that still works).

        Step 3: Run the do-file on all previously converted Excel files

        Code:
        xls2dta : do do_for_all_files.do
        Step 4: If desired, append the files

        Code:
        xls2dta , clear : append
        Best
        Daniel

        Comment


        • #5
          Thank you all so much for your advice. I am currently endeavouring to utilise your suggestions in my code. I will let you know how I get on!

          Comment


          • #6
            Cross-reference to http://www.statalist.org/forums/foru...le-excel-files ??

            Comment

            Working...
            X