Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Importing and converting multiple excel files

    Hi everyone,

    I am facing quite a significant issue with cleaning some data. As such, I would appreciate any advice or guidance that anyone can offer. I thank you in advance for your time.

    I have attached an example file for reference. All of the files (over 1000) that I am dealing with have an identical format. What I would like to achieve is the following:

    The files in question are bank financial asset reports. Where the filename gives the name of the bank, the year, and the quarter (name-of-bank-20XX-Quarter).

    The important part of each excel file from my perspective starts at row 16 of each sheet. The most important title row is row 17, where each letter represents the type of data in that column. For example row 17, Column E is denoted by the letter L. In this data set, L=lending. Column 1 then gives me my variable names. For example then, the cell (Row 35, Column E) gives me the lending credit provision for the bank in question.

    With this data I need to achieve a few things. Firstly cleaning and reshaping:
    • I want to convert each excel file in my directory into a stata file.
    • I would then like to find a way to convert my data so that the variables column (column A in the attached file) and the row 17 are combined.
      • For example to have a set of variables (given in column A) with a prefix of the letter in row 17. So I would have a final set of variables that looked like this: L(1.Penempatan pada bank lain) L(a. Rupiah)........DPK(1.Penempatan pada bank lain) DPK(a. Rupiah)........ and so on.
    • Following this I would like to add two variables to each data set: a name of the bank variable, and a time variable. These variables need to be pulled from the filename. So in the example file, I would like to generate a new variable called 'bank name'==bank-of-china-limited. I would also like to generate a time variable that indicated both the year and quarter in the xlsx file name (2015 Quarter 1). Note (the quarters are denoted as follows in my directory (3=Quarter 1, 6=Quarter 2, 9=Quarter 3, 12=Quarter 4)).
    • I would then like to reshape the data so it is in long-panel form.
    • Then I need to combine each solitary stata file into a single large panel file that includes each bank.
    I have never written a loop of this complexity before, and given the specific requirements and format I need my data to be in, I am unsure how to proceed.

    Any help/guidance on this matter would be greatly appreciated.

    Kind Regards,

    Sascha

    (PS. as an example, here is what my final ideal data set may look like)
    Bank name Date L(1.Penempatan pada bank lain) L(a. Rupiah) DPK(1.Penempatan pada bank lain) DPK(a. Rupiah)
    bank of china limited 2015Q1 (value) (value) (value) (value)
    bank of china limited 2015Q2 (value) (value) (value) (value)
    However, where the variables in the first row include every variable from column A in my data files.

    Attached Files

  • #2
    Not clear why you started a new thread here. Cross-reference http://www.statalist.org/forums/foru...le-excel-files

    Comment


    • #3
      It was not allowing me to update the previous one, and I was unable to figure out why. Any suggestions?

      To be more specific, since my last update on that post. It is not allowing me to 'post reply' anymore, and I was unsure what to do. For clarification, this post is certainly intended to be an update on the previous one you linked.
      Last edited by Sascha Kerbert; 27 Nov 2016, 16:10.

      Comment


      • #4
        You can only edit posts within a short time (1 hour). You can, and should, add extra posts within a thread if the subject is the same.

        Comment

        Working...
        X