Hello everyone! This is my first post, if I don’t respect some blog rules: sorry!
I have a simple question about data management.
I'm working with a dataset in which I have an ID variable that identifies the code of a project and a lot of other descriptive variables. The one that interested me is about the localization of the project at the regional level (nuts code) but a project can refer to more than one nuts code. So in my dataset, I have the duplicates of the ID variables for how many nuts codes are involved in the project. An example:
My purpose is to have only one row for each ID, so the strategy that I'm trying is to "collapse" by ID and create the variables about the nuts code of the project. The result that I'm imaging to get is something like that:
I've tried to bypass the problem by running a table command and manipulating the output in Excel to create a new dataset, but I bet it's not the fastest solution (also for the time problems related to the command "collect export")
Any advice?? Thanks a lot!
I have a simple question about data management.
I'm working with a dataset in which I have an ID variable that identifies the code of a project and a lot of other descriptive variables. The one that interested me is about the localization of the project at the regional level (nuts code) but a project can refer to more than one nuts code. So in my dataset, I have the duplicates of the ID variables for how many nuts codes are involved in the project. An example:
ID | nuts2 code |
2014BE16RFOP003 | BE31 |
2014BE16RFOP003 | BE32 |
2014BE16RFOP003 | BE33 |
2014BE16RFOP003 | BE34 |
2014BE16RFOP003 | BE35 |
2014DE16RFOP012 | DED2 |
2014DE16RFOP012 | DED4 |
2014DE16RFOP012 | DED5 |
My purpose is to have only one row for each ID, so the strategy that I'm trying is to "collapse" by ID and create the variables about the nuts code of the project. The result that I'm imaging to get is something like that:
ID | nuts_code_1 | nuts_code_2 | nuts_code_3 | nuts_code_4 | nuts_code_5 |
2014BE16RFOP003 | BE31 | BE32 | BE33 | BE34 | BE35 |
2014DE16RFOP012 | DED2 | DED4 | DED5 |
I've tried to bypass the problem by running a table command and manipulating the output in Excel to create a new dataset, but I bet it's not the fastest solution (also for the time problems related to the command "collect export")
Any advice?? Thanks a lot!
Comment