Create new variables with values taken from another variable

Johny Gul

Join Date: Feb 2017
Posts: 3

Create new variables with values taken from another variable

06 Feb 2017, 16:33

Hi,

Here is a simplified version of my data:

Row	Year	_id	_nn
1		39011	51805
2	2006	51805	77823
3		11921	33111
4	2008	77823	69173
5		14283	77823

where _id and _nn respectively have no duplicates.

I am looking for a way in which Stata could do the followings:
1.Create a new variable (for example "NewYear")
2. Then replace the variable "NewYear" with the value from "Year" if Stata could find overlapping/common values from variable "_nn" and "_id".
For example, in row 1, the variable "NewYear" would be 2006 since the value of variable _nn contains the same value of 51805 with row 2 _id.
3. For observations with no common values between _nn and _id, variable "NewYear" would be missing.
4. For observations with value in "Year" variable, variable "NewYear" will take the value from "Year" variable

The expected output is as follows:

Row	Year	_id	_nn	NewYear	Comments
1		39011	51805	2006	Taken from Row 2
2	2006	51805	77823	2006	No change since it got data in "Year" variable
3		11921	33111	.	Missing value since no common "_nn" with "_id" throughout the sample
4	2008	77823	69173	2008	No change since it got data in "Year" variable
5		14283	77823	2008	Taken from Row 4

Where "Comments" is just for explanation purpose only.

I have also refereed to http://www.stata.com/statalist/archive/2013-06/msg00057.html and http://www.stata.com/statalist/archi.../msg00508.html . But they seem not to be what I am looking for.

Would appreciate if someone could assist me on this and I would improve my question if it is not clear enough.
Thanks in advance.

Regards

Tags: None

Chris Larkin

Join Date: Apr 2016

Posts: 296
#2

06 Feb 2017, 16:55

Hi Johnny,

Welcome to Statalist! When sharing data on the forum it is strongly preferred that you use dataex (ssc install dataex) to produce a short snippet of your data that other members can easily copy in to a do-file and work with in Stata to help solve your problem. Do note this for future posts.

For the first part of your problem I put together this code

Code:

//dataex output clear input int year float newyear long(_id _nn) . . 39011 51805 2006 2006 51805 77823 . . 11921 33111 2008 2008 77823 69173 . . 14283 77823 end //your code starts from here local cmd generate qui summ, meanonly forvalues a = 1/`r(N)'{ forvalues e = 1/`r(N)'{ `cmd' NewYear = year if _id[`a'] == _nn[`e'], a(year) local cmd replace } }

I wasn't entirely sure what you meant by point 4 above, but if this snippet does not do everything you need do please try and clarify how it deviates from what you expect and I will put something else together.

There is also no doubt a much better way of doing this. It works though, and unless you have a really large dataset and / or a really old computer it should be done in a jiffy!

Last edited by Chris Larkin; 06 Feb 2017, 17:36.
Comment
Johny Gul

Join Date: Feb 2017

Posts: 3
#3

06 Feb 2017, 17:38

Hi Larkin,

Thank you for the prompt reply.
Will take note of the dataex in future.
Might be a silly question, but may I know why the code is not working? No new variable of "NewYear" is generated although no error is shown.

I have also tried this by myself , but it is not working (no observation in the end)

Code:

quietly su _id global minid `r(min)' global maxid `r(max)' quietly su _nn global minn1 `r(min)' global maxn1 `r(max)' gen NewYear=. forvalues i = $minid / $maxid { forvalues j = $minn1 / $maxn1 { replace NewYear= Year if `i'==`j' di `i' } }

Thank you.

Last edited by Johny Gul; 06 Feb 2017, 17:46.
Comment
Chris Larkin

Join Date: Apr 2016

Posts: 296
#4

06 Feb 2017, 18:09

My code may not work because of the , a(year) option at the end of the line that generates the NewYear variable. Try removing that and it should work fine. It's a bit odd that when generate is stored within a local the options for the command disappear, but this is something for another day

Your code should work in the end, but it'll take ages to get there. You only need to loop for as many rows as you have in your dataset squared.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#5

06 Feb 2017, 18:10

I was going to show you another way to do this, but it relied on

where _id and _nn respectively have no duplicates

being true. Unfortunately, the value 77823 appears twice in _nn, in the second and the last observations.
Comment

Johny Gul

Join Date: Feb 2017
Posts: 3

06 Feb 2017, 19:31

Hi Chris,
Thank you for the prompt reply. It seems like the variable "NewYear" is not generated despite me removing the option, a(year). Maybe I have done something wrong?
May I know what do you mean by "You only need to loop for as many rows as you have in your dataset squared."? How can I incorporate your suggestion to my coding?

Hi Clyde,
My apologies for the mistake. The data should indeed be unique (no duplicates). Below is the expected outcome:

Row	Year	_id	_nn	NewYear	Comments
1		39011	51805	2006	Taken from Row 2
2	2006	51805	77823	2006	No change since it got data in "Year" variable
3		11921	33111	.	Missing value since no common "_nn" with "_id" throughout the sample
4	2008	77823	69173	2008	No change since it got data in "Year" variable
5		14283	16927	2010	Taken from Row 6
6	2010	16927	42421	2010	No change since it got data in "Year" variable

I am not sure this is the best way to reply each of you. Hope this reply is acceptable.

Thank you.

Regards

Comment

Chris Larkin

Join Date: Apr 2016

Posts: 296
#7

06 Feb 2017, 19:46

Hi Johny,

I'm not sure what's going wrong. I've copied and pasted my code verbatim (removing the , a(year) option) and it runs fine.

Try doing it without the local cmd, so:

Code:

gen NewYear=.,a(year) qui summ, meanonly forvalues a = 1/`r(N)'{ forvalues e = 1/`r(N)'{ replace NewYear = year if _id[`a'] == _nn[`e'] } }
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#8

06 Feb 2017, 22:33

I believe this does what you want.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte row int year long(_id _nn) 1 . 39011 51805 2 2006 51805 77823 3 . 11921 33111 4 2008 77823 69173 5 . 14283 16927 6 2010 16927 42421 end preserve keep year _id rename _id _nn rename year newyear keep if !missing(newyear) tempfile holding save `holding' list restore gen newyear = year count merge 1:1 _nn using `holding', update nogenerate sort row

Please do note the use of -dataex- to create the example data. For future posting, please install it (-ssc install dataex-) and read the simple instructions in -help dataex-; use it whenever you post sample data going forward. That will make it as simple as possible for those who want to help you to create a completely faithful replica of your example with a simple copy paste operation.
Comment

Announcement

Create new variables with values taken from another variable

Comment

Comment

Comment

Comment

Comment

Comment

Comment