Create multiple variables from a string variable

Giovanni Cuttica

Join Date: Dec 2022

Posts: 2
#1

Create multiple variables from a string variable

28 Dec 2022, 08:03

Hello everyone! This is my first post, if I don’t respect some blog rules: sorry!
I have a simple question about data management.

I'm working with a dataset in which I have an ID variable that identifies the code of a project and a lot of other descriptive variables. The one that interested me is about the localization of the project at the regional level (nuts code) but a project can refer to more than one nuts code. So in my dataset, I have the duplicates of the ID variables for how many nuts codes are involved in the project. An example:

ID nuts2 code

2014BE16RFOP003 BE31

2014BE16RFOP003 BE32

2014BE16RFOP003 BE33

2014BE16RFOP003 BE34

2014BE16RFOP003 BE35

2014DE16RFOP012 DED2

2014DE16RFOP012 DED4

2014DE16RFOP012 DED5

My purpose is to have only one row for each ID, so the strategy that I'm trying is to "collapse" by ID and create the variables about the nuts code of the project. The result that I'm imaging to get is something like that:

ID nuts_code_1 nuts_code_2 nuts_code_3 nuts_code_4 nuts_code_5

2014BE16RFOP003 BE31 BE32 BE33 BE34 BE35

2014DE16RFOP012 DED2 DED4 DED5

I've tried to bypass the problem by running a table command and manipulating the output in Excel to create a new dataset, but I bet it's not the fastest solution (also for the time problems related to the command "collect export")

Any advice?? Thanks a lot!
Tags: data, string

Nick Cox

Join Date: Mar 2014
Posts: 35664

28 Dec 2022, 08:14

Statalist isn't a blog -- it is a web forum -- and the only rules are implicit (don't do anything that would make StataCorp want to ban you): everything else is a request!

In terms of your question you can get from stated start to desired destination, but the real question is what it implies for your other variables.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str15 id str4 nuts2code
"2014BE16RFOP003" "BE31"
"2014BE16RFOP003" "BE32"
"2014BE16RFOP003" "BE33"
"2014BE16RFOP003" "BE34"
"2014BE16RFOP003" "BE35"
"2014DE16RFOP012" "DED2"
"2014DE16RFOP012" "DED4"
"2014DE16RFOP012" "DED5"
end

. bysort id (nuts2code) : gen id2 = _n

. reshape wide nuts2code, i(id) j(id2)
(j = 1 2 3 4 5)

Data                               Long   ->   Wide
-----------------------------------------------------------------------------
Number of observations                8   ->   2           
Number of variables                   3   ->   6           
j variable (5 values)               id2   ->   (dropped)
xij variables:
                              nuts2code   ->   nuts2code1 nuts2code2 ... nuts2code5
-----------------------------------------------------------------------------

. l

     +------------------------------------------------------------------------+
     |              id   nuts2c~1   nuts2c~2   nuts2c~3   nuts2c~4   nuts2c~5 |
     |------------------------------------------------------------------------|
  1. | 2014BE16RFOP003       BE31       BE32       BE33       BE34       BE35 |
  2. | 2014DE16RFOP012       DED2       DED4       DED5                       |
     +------------------------------------------------------------------------+

Comment

Giovanni Cuttica

Join Date: Dec 2022

Posts: 2
#3

28 Dec 2022, 08:51

Thank you so much, I hadn’t thought about the reshape command and your suggestion worked great!
To clarify about the "real question" that you suggest: I have no problems with the other variables because this is just one of the steps of a merging process and I will recall those variables later.
Sorry for the bad description of Statlist, now that I've received my "welcome" I know the rules...
Comment

Announcement

Create multiple variables from a string variable

Comment

Comment