Hi statalists,
I'm currently dealing with a survey panel dataset. This dataset only simply lists the indices for each questions, sort by questions and I find it is tricky to directly analyze it. The first column is the description of questions, the 2nd the 3rd and 4th columns are the numeric answers for 2014/2015. My aim is to generate more variables (columns) for each question so that all answers for each country will be shown only in one row.
Firstly, I tried "encode Question, gen(id_question) " to converting first column into numeric values so that I can quickly find questions. But there is one more thing need to be considered that this dataset could include more questions in the future. The id of each question might be disrupted. So I wonder if there is any command that I can locate observations according to the characters in their labels and then put the values to according columns. And the description of some questions look very similar e.g. Number of ATMs per 100,00 adults / Number of ATMs per 1,000 km2. So I hope I can locate the label accurately.
Here is the dataset example:
The output I expect (the columns after country column stand for each questions )
Thanks in advance!
I'm currently dealing with a survey panel dataset. This dataset only simply lists the indices for each questions, sort by questions and I find it is tricky to directly analyze it. The first column is the description of questions, the 2nd the 3rd and 4th columns are the numeric answers for 2014/2015. My aim is to generate more variables (columns) for each question so that all answers for each country will be shown only in one row.
Firstly, I tried "encode Question, gen(id_question) " to converting first column into numeric values so that I can quickly find questions. But there is one more thing need to be considered that this dataset could include more questions in the future. The id of each question might be disrupted. So I wonder if there is any command that I can locate observations according to the characters in their labels and then put the values to according columns. And the description of some questions look very similar e.g. Number of ATMs per 100,00 adults / Number of ATMs per 1,000 km2. So I hope I can locate the label accurately.
Here is the dataset example:
Question | Country | Index_2014 | Index_2015 |
Number of deposit accounts per 1,000 adults-Q1 | A | 1 | 7 |
Number of bank branches per 100,000 adults-Q2 | A | 2 | 8 |
Number of ATMs per 100,000 adults-Q3 | A | 3 | 9 |
Number of deposit accounts per 1,000 adults-Q1 | B | 4 | 10 |
Number of bank branches per 100,000 adults-Q2 | B | 5 | 11 |
Number of ATM2 per 1,000 km2-Q4 | C | 6 | 12 |
The output I expect (the columns after country column stand for each questions )
Country | deposit_2014 | branch_2014 | ATMadult_2014 | ATMkm_2014 | deposit_2015 | branch_2015 | ATMadult_2015 | ATMkm_2015 |
A | 1 | 2 | 3 | . | 7 | 8 | 9 | . |
B | 4 | 5 | . | . | 10 | 11 | . | . |
C | . | . | . | 6 | . | . | . | 12 |
Comment