Dear Statalist,
We have collected annual data on minorities in several countries for 25 years. Now I would like to import all data into Stata and create a master dataset: 160 variables with data on 25 years for almost 800 minorities. The variables are in numeric and string format.
The tricky part:
The data for each minority is stored in a separate Excel file (i.e. 800 Excel files).
Each Excel file consists of five worksheets.
Each worksheet contains different variables (and a different number of variables, hence columns).
In the first row of each worksheet the years are listed.
Each worksheet contains two fields indicating country and group identifier.
The design of the worksheets is unfavourable, because country and group identifier and year are in the same row. Similarly, the group identifier and the variables are in the same row as well.
I would like to
Unfortunately, I was not able to find help for my particular problem. Is there a code that I could use for a loop over all my 800 Excel files?
This is how the worksheets look like:
Worksheet 1 in Excel file A
Worksheet 2 in Excel file A
This is how the worksheets should look like after they were merged:
In the above format I could append the various Excel files. I hope my description is specific enough. I very much appreciate your support.
Thank you. Tom.
We have collected annual data on minorities in several countries for 25 years. Now I would like to import all data into Stata and create a master dataset: 160 variables with data on 25 years for almost 800 minorities. The variables are in numeric and string format.
The tricky part:
The data for each minority is stored in a separate Excel file (i.e. 800 Excel files).
Each Excel file consists of five worksheets.
Each worksheet contains different variables (and a different number of variables, hence columns).
In the first row of each worksheet the years are listed.
Each worksheet contains two fields indicating country and group identifier.
The design of the worksheets is unfavourable, because country and group identifier and year are in the same row. Similarly, the group identifier and the variables are in the same row as well.
I would like to
- Merge all worksheets within each Excel file by year.
- Create variable “year”.
- Create variable “ID” which informs about country-group per year.
- Append all Excel files in order to have one master dataset.
Unfortunately, I was not able to find help for my particular problem. Is there a code that I could use for a loop over all my 800 Excel files?
This is how the worksheets look like:
Worksheet 1 in Excel file A
Country: FRA | |||
Group: B2 | Var_1 | Var_2 | Var_3 |
1990 | |||
1991 | |||
1992 | |||
1993 | |||
1994 |
Country: FRA | |||
Group: B2 | Var_4 | Var_5 | Var_6 |
1990 | |||
1991 | |||
1992 | |||
1993 | |||
1994 |
This is how the worksheets should look like after they were merged:
ID | year | Var_1 | Var_2 | Var_3 | Var_4 | Var_5 | Var_6 |
FRA-B2 | 1990 | ||||||
FRA-B2 | 1991 | ||||||
FRA-B2 | 1992 | ||||||
FRA-B2 | 1993 | ||||||
FRA-B2 | 1994 |
Thank you. Tom.
Comment