Hi everyone,
I am facing quite a significant issue with cleaning some data. As such, I would appreciate any advice or guidance that anyone can offer. I thank you in advance for your time.
I have attached an example file for reference. All of the files (over 1000) that I am dealing with have an identical format. What I would like to achieve is the following:
The files in question are bank financial asset reports. Where the filename gives the name of the bank, the year, and the quarter (name-of-bank-20XX-Quarter).
The important part of each excel file from my perspective starts at row 16 of each sheet. The most important title row is row 17, where each letter represents the type of data in that column. For example row 17, Column E is denoted by the letter L. In this data set, L=lending. Column 1 then gives me my variable names. For example then, the cell (Row 35, Column E) gives me the lending credit provision for the bank in question.
With this data I need to achieve a few things. Firstly cleaning and reshaping:
Any help/guidance on this matter would be greatly appreciated.
Kind Regards,
Sascha
(PS. as an example, here is what my final ideal data set may look like)
However, where the variables in the first row include every variable from column A in my data files.
I am facing quite a significant issue with cleaning some data. As such, I would appreciate any advice or guidance that anyone can offer. I thank you in advance for your time.
I have attached an example file for reference. All of the files (over 1000) that I am dealing with have an identical format. What I would like to achieve is the following:
The files in question are bank financial asset reports. Where the filename gives the name of the bank, the year, and the quarter (name-of-bank-20XX-Quarter).
The important part of each excel file from my perspective starts at row 16 of each sheet. The most important title row is row 17, where each letter represents the type of data in that column. For example row 17, Column E is denoted by the letter L. In this data set, L=lending. Column 1 then gives me my variable names. For example then, the cell (Row 35, Column E) gives me the lending credit provision for the bank in question.
With this data I need to achieve a few things. Firstly cleaning and reshaping:
- I want to convert each excel file in my directory into a stata file.
- I would then like to find a way to convert my data so that the variables column (column A in the attached file) and the row 17 are combined.
- For example to have a set of variables (given in column A) with a prefix of the letter in row 17. So I would have a final set of variables that looked like this: L(1.Penempatan pada bank lain) L(a. Rupiah)........DPK(1.Penempatan pada bank lain) DPK(a. Rupiah)........ and so on.
- Following this I would like to add two variables to each data set: a name of the bank variable, and a time variable. These variables need to be pulled from the filename. So in the example file, I would like to generate a new variable called 'bank name'==bank-of-china-limited. I would also like to generate a time variable that indicated both the year and quarter in the xlsx file name (2015 Quarter 1). Note (the quarters are denoted as follows in my directory (3=Quarter 1, 6=Quarter 2, 9=Quarter 3, 12=Quarter 4)).
- I would then like to reshape the data so it is in long-panel form.
- Then I need to combine each solitary stata file into a single large panel file that includes each bank.
Any help/guidance on this matter would be greatly appreciated.
Kind Regards,
Sascha
(PS. as an example, here is what my final ideal data set may look like)
Bank name | Date | L(1.Penempatan pada bank lain) | L(a. Rupiah) | DPK(1.Penempatan pada bank lain) | DPK(a. Rupiah) |
bank of china limited | 2015Q1 | (value) | (value) | (value) | (value) |
bank of china limited | 2015Q2 | (value) | (value) | (value) | (value) |
Comment