Assign value from different variable depending on value of other variable, without loop

Garret Christensen

Join Date: Jul 2014

Posts: 38
#1

Assign value from different variable depending on value of other variable, without loop

17 May 2018, 17:06

I'd like to create a variable with a value from a certain other variable, where the other variable the value comes from depends on the value of yet another variable. An example might help: the observations are journal articles, and I have the year they were published, as well as the number of citations each article got in each calendar year. I want a new variable that is the number of citations an article received 5 years after publication.

I did this with a loop, but Is there a nicer, loop-free way to write this code? Here's a working example:

*Make the original data set
clear all
set seed 1492
set obs 100
*Values for each year (ex: citation count for an article in a given year)
forvalues X=2000/2015 {
gen _`X'value=round(runiform(0,10),1)
}
*Assign the year (ex: publication year of an article)
gen year=2000+floor(runiform(0,8))

*Create new var with value from other variable, which var depends on value of 'year'
*Create var with value from 5 years after publication
gen year5value=.
forvalues X=2000/2007 {
replace year5value=_`=`X'+5'value if year==`X'
}
Tags: None
Jean-Claude Arbaut

Join Date: Jul 2017

Posts: 209
#2

17 May 2018, 17:20

What about this?

Code:

gen id=_n reshape long _@value, i(id) j(yr) keep if yr==year+5

In case you can to sum all citations within 6 years, you can do this instead:

Code:

gen id=_n reshape long _@value, i(id) j(yr) keep if yr>=year & yr<=year+5 collapse (sum) _value, by(id year)

Last edited by Jean-Claude Arbaut; 17 May 2018, 17:36. Reason: year to year+5, it's 6
1 like
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#3

17 May 2018, 18:41

Jean-Claude's two solutions both use the reshape command, and they illustrate an important Stata principle worth mentioning.

The first thing we notice is that your data is in what Stata would call a "wide" layout with the values of the citation count for different years in different variables. Jean-Claude transforms it to a "long" layout, where each observation has just one value of the citation count for one given year.

The experienced users here generally agree that, with few exceptions, Stata makes it much more straightforward to accomplish complex analyses using a long layout of your data rather than a wide layout of the same data. It certainly does make things much easier here, eliminating the need for looping over the variable names.

In particular, if you are a former SAS user comfortable with SAS "arrays" of variables, and realize this would not need a loop in SAS, then I'm sorry to tell you that there is no similar construct in Stata. Even after several years of SAS withdrawal, I still occasionally regret the lack of that capability in Stata. But not enough to to make me miss SAS.
Comment
Jean-Claude Arbaut

Join Date: Jul 2017

Posts: 209
#4

18 May 2018, 01:51

If you don't want to lose the dataset shape, it's also possible to replace the solutions in my previous post by the following:

Code:

gen id=_n reshape long _@value, i(id) j(yr) by id: egen ncit1=max(cond(yr==year+5,_value,0)) by id: egen ncit2=sum(cond(yr>=year & yr<=year+5,_value,0)) reshape wide

The trick here is to make sure the values of ncit1 and ncit2 are unique within each id group, or reshape wide will fail.

William Lisowski
It's possible to use Mata for that. However, I'm not sure it's possible to do this task without a loop. My idea was something like:

Code:

mata st_view(years=.,.,"_*") st_view(ncit=.,.,"ncit") st_view(pubyr=.,.,"year") ncit[.]=years[.,pubyr:-1999] end

But of course it's wrong: years[.,pubyr:-1999] does not select the correct column for each row, it selects all columns. It's however easy to write a function to do that, using a loop.

A last remark to the OP, as I see round(runiform(0,10)) and floor(runiform(0,8)): see runiformint in help random number functions.

Hope this helps

Jean-Claude Arbaut

Last edited by Jean-Claude Arbaut; 18 May 2018, 02:08.
Comment

Announcement

Assign value from different variable depending on value of other variable, without loop

Comment

Comment

Comment