create new variables names based on the value of another variable

Constantin Alba

Join Date: Sep 2014

Posts: 80
#1

create new variables names based on the value of another variable

10 Oct 2016, 20:51

HI all, could find an answer to this (or, probably could not formulate the search correctly)

Here is my problem. I have the following dataset:

ID NAME

1 ABC

2 ABC

3 DEF

4 GIH

5 XYZ

Neither ID nor Name are unique

I need to create the following structure:

ID NAME ABC DEF GIH XYZ

1 ABC

2 ABC

3 DEF

4 GIH

5 XYZ

in other words, for each name I need to create a variable that is called the same

Any suggestions?

Thank you in advance,

Constantin
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30091
#2

10 Oct 2016, 22:32

This is only possible if each value of NAME is a legal Stata variable name. That is, contains only letters, numerals, and the underscore character, does not begin with a number, and does not exceed 32 characters in length. If all of that is true, then you can do this:

Code:

levelsof NAME, local(names) foreach n of local names { gen `n' = . }

That will create a new variable named after each distinct value of NAME. The variable created will be numeric and all its values will be missing. If you need the new variables to be strings, just replace -gen `n' = .- with -gen str `n' = ""-.

In the future, please post example data using the -dataex- command. The HTML tables are not easy to import into Stata, and had I needed to experiment with your data to solve your problem, it would have taken too much effort and time. With -dataex-, a simple copy-paste-do operation immediately creates a perfectly faithful replica of your example. If you do not have the -dataex- command, you can get it by running -ssc install dataex-. The simple instructions are at -help dataex-. Thank you.

Last edited by Clyde Schechter; 10 Oct 2016, 22:34.
Comment
Constantin Alba

Join Date: Sep 2014

Posts: 80
#3

10 Oct 2016, 23:51

Thanks Clyde,
That worked, I was afraid to take this approach as there are too many different names and I was afraid my pc will not be able to cope with it (was close, but worked - 2GB data file)
Unfortunately, it did not help me to do the calculations I wanted. need to think of another way or come back for the help of stata community

P.S> Apologies, for not using the -dataex-
Comment

Giovanna Porta

Join Date: Feb 2020
Posts: 2

17 Feb 2020, 06:37

Good Morning, How can I add to Clyde's code to assign values to the newly created variables if their names matches the value of another variable? I'm starting from this dataset

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str5 subjectid long idate str12 qname byte response
"101" 21628 "q1"  1
"101" 21628 "q15" 2
"102" 21629 "q87" 1
"102" 21629 "q32" 4
"103" 21629 "q14" 7
"103" 21629 "q14" 5
"103" 21629 "q32" 4
end
format %tdD_m_Y idate

and I would to like to obtain

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str5 subjectid long idate str12 qname byte response float(q1 q14 q15 q32 q87)
"101" 21628 "q1"  1 1 . . . .
"101" 21628 "q15" 2 . . 2 . .
"102" 21629 "q87" 1 . . . . 1
"102" 21629 "q32" 4 . . . 4 .
"103" 21629 "q14" 7 . 7 . . .
"103" 21629 "q14" 5 . 5 . . .
"103" 21629 "q32" 4 . . . 4 .
end
format %tdD_m_Y idate

i.e. to assign q1-q87 the value in response when qname matches the name of q1-q87.

I apologize if this is covered elsewhere and I missed it, and I appreciate any help.
Best,

Giovanna

Comment

Wouter Wakker

Join Date: Nov 2018
Posts: 621

17 Feb 2020, 06:55

Code:

levelsof qname, local(names)
foreach name in `names' {
    gen `name' = response if qname == "`name'"
}

Comment

Giovanna Porta

Join Date: Feb 2020

Posts: 2
#6

17 Feb 2020, 07:02

Thank you!
Comment
Konstantina Maragkou

Join Date: Dec 2017

Posts: 12
#7

31 Aug 2020, 03:06

Hello!

I want to do exactly what Clyde's code does but I get the error "r(103) too many variables specified" after creating the first three variables. Any ideas why that might be happening?

Best wishes
Konstantina
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#8

31 Aug 2020, 03:23

When I tried to create to many variables on purpose (under Stata IC the maximum variables are 2048, I tried to create 2050), I got the following message:

Code:

no room to add more variables Up to 2,048 variables are allowed with this version of Stata. Versions are available that allow up to 32,767 variables. r(900);

Therefore it is not clear where your error comes from.

Type

Code:

set trace on

just before what you are trying to do, then look up at what Stata returns until you see the message in red, copy sat 7 lines before, and 3 lines after the red error message, and paste it here. Use code delimiters when you paste it.

Originally posted by Konstantina Maragkou View Post

Hello!

I want to do exactly what Clyde's code does but I get the error "r(103) too many variables specified" after creating the first three variables. Any ideas why that might be happening?

Best wishes
Konstantina
Comment

Konstantina Maragkou

Join Date: Dec 2017
Posts: 12

31 Aug 2020, 06:05

Joro thanks a lot for your guidance. This is what the code returns:

Code:

. foreach n of local names {
  2.     gen `n' = .
  3. }
- foreach n of local names {
- gen `n' = .
= gen Adur = .
(354 missing values generated)
- }
- gen `n' = .
= gen Allerdale = .
(354 missing values generated)
- }
- gen `n' = .
= gen Alnwick = .
(354 missing values generated)
- }
- gen `n' = .
= gen Amber Valley = .
too many variables specified
  }
r(103);

end of do-file

r(103);

Is it because there is space between words at the fourth variable where the error occurs? If this is the problem is there any way for the code to run and the city names to not change?

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35693
#10

31 Aug 2020, 06:15

Yes. Stata variable names can't include spaces.

I know the context of British place names. (Alnwick is even one hour's drive from me and a place that in normal times I would often visit, partly for its bookshop. Many, many readers will have seen Alnwick unknowingly as its castle is often used in as a television or movie set.)

There's a bigger deal lurking behind the question. It's highly unlikely to be a good idea to have a separate variable name for each place in a Stata dataset. Very likely you need a different data layout altogether. To get better advice, back up and tell us more about your dataset and what you want to do.
Comment
Konstantina Maragkou

Join Date: Dec 2017

Posts: 12
#11

01 Sep 2020, 03:34

Nick thank you so much for your response (and the interesting information regarding Alnwick's castle - I should visit it some time!).

It is a bit complicated what I want to do but I will try my best to explain. I want to have a separate variable name for each Local Authority District (LAD) in order to add information on area-to-area migration flows. After managing to do that I would reshape the data in long format so that all migration flows are in a single column. The aim is to construct weights based on the area-to-area migration flows as a proportion of the total migration from each LAD. Let me know if that makes sense!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35693
#12

01 Sep 2020, 04:07

Thanks for the extra detail. Unfortunately it's not enough to add helpfully to my earlier post. That's just a fact that I don't understand exactly what you want to do, and in no sense a criticism.
Comment
Konstantina Maragkou

Join Date: Dec 2017

Posts: 12
#13

01 Sep 2020, 07:18

I really apologise for not being clear enough. I have an area-level dataset where each observation represents an individual LAD. The reason I want to have a separate variable for each LAD is that I want to add information on migration flows to each LAD from each other LAD (the outcome will look like a 343 x 343 matrix as there are 343 LADs in my data). For example, I will have a variable named “Sheffield” and each value of "Sheffield" will indicate the number of individuals migrating from Sheffield to each other LAD. Next, I will reshape the data in long format so that migration flows from all LADs are in a single column while each value of the variable LAD (which is a LAD name for example "Sheffield") will appear in 343 rows. I will then be able to construct migration weights for each LAD based on the migration flows from each other LAD. For example, if there are 100 people migrating from York to Sheffield while there are 1000 people that migrate to Sheffield in total, York would weight 0.1. The final aim is to construct a weighted average unemployment measure in all other LADs (for example the weighted average unemployment in all LADs except Sheffield).

I appreciate that what I want to do is quite complicated, especially to explain it to someone else since I am still working on it. Being able to have a separate variable name for each LAD will be a start though.
Comment

Joro Kolev

Join Date: Aug 2018
Posts: 3050

#14

01 Sep 2020, 08:17

I do not know whether what you want to do makes sense, but to overcome the immediate problem, that your two-words-with-a-space-between locations are not legal variable names, you can use the function -strtoname(s[,p])-. So generate firstly a variable that contains legal Stata names, and then go ahead with the code you are trying to apply.

Code:

. sysuse auto, clear
(1978 Automobile Data)

. keep make

. gen name = strtoname(make)

. list in 1/10

     +-------------------------------+
     | make                     name |
     |-------------------------------|
  1. | AMC Concord       AMC_Concord |
  2. | AMC Pacer           AMC_Pacer |
  3. | AMC Spirit         AMC_Spirit |
  4. | Buick Century   Buick_Century |
  5. | Buick Electra   Buick_Electra |
     |-------------------------------|
  6. | Buick LeSabre   Buick_LeSabre |
  7. | Buick Opel         Buick_Opel |
  8. | Buick Regal       Buick_Regal |
  9. | Buick Riviera   Buick_Riviera |
 10. | Buick Skylark   Buick_Skylark |
     +-------------------------------+

Comment

Konstantina Maragkou

Join Date: Dec 2017

Posts: 12
#15

01 Sep 2020, 10:03

I was mostly looking for a way to do this without changing the city names, but thanks a lot for the suggestion!
Comment

ID	NAME
1	ABC
2	ABC
3	DEF
4	GIH
5	XYZ

ID	NAME	ABC	DEF	GIH	XYZ
1	ABC
2	ABC
3	DEF
4	GIH
5	XYZ

Announcement