Finding the first occurrence of a specific value or greater

Ida Viklund

Join Date: Jun 2015

Posts: 4
#1

Finding the first occurrence of a specific value or greater

01 Jul 2015, 01:41

Dear Stata users,
I would need help with finding the first time a specific value or greater in a varlist appears.
I have panel data looking soemthing like this:

ID NR VAR
1 1 15
1 2 3
1 3 22
1 4 17
1 5 25

I would like to mark the observation where var is 20 or greater for the first time, by each id. In the case above it's the third observtion. It might also be that for some IDs the value 20 or greater doesn't exist. As I understand the egen command ifirst is used to find a specific value only, and cannot be combined with ( >= ).

So I tried the command:
egen ig20= ifirst(var), v(>=20) after by(ID)
"option value () invalid" is the error syntax, which of course has to do with the >=. Executing the code without >= works perfectly fine.

Would you know how I could proceed instead?

Best and thanks in advance,
Ida
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35715
#2

01 Jul 2015, 01:53

This is an FAQ. Recall that Statalist FAQ Advice advises you to look at the Stata FAQs before posting. Note that in Stata

Code:

search first occurrence

finds the FAQ in question, so you had exactly the right keywords:

FAQ . . . . . . . . . . . . . . . First and last occurrences in panel data
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
3/07 How can I identify first and last occurrences
systematically in panel data?
http://www.stata.com/support/faqs/data/firstoccur.html

Any way, it's just one line:

Code:

bysort id (NR) : gen firstge20 = sum(VAR >= 20 & VAR < .) == 1

Deconstruction:

Code:

VAR >= 20 & VAR < .

returns 1 when true and 0 when false. Its running or cumulative sum tallies such occurrences. When the cumulative sum is 1, that marks the first occurrence. This is all done panelwise, by virtue of the bysort. Note the need to be cautious if VAR could be missing, as missing VAR counts as greater than 20 too.

The FAQ gives another solution, and there are yet others, but this one is worth understanding.

Last edited by Nick Cox; 01 Jul 2015, 01:57.
Comment
Ida Viklund

Join Date: Jun 2015

Posts: 4
#3

01 Jul 2015, 02:08

Great! Thanks for advice both on "search" as well as code. Very helpful!
Comment
Mohak Gupta

Join Date: Sep 2022

Posts: 1
#4

15 Sep 2022, 21:21

Its been 7 years since this post but for anyone looking at this later, the approach suggested above is pretty good but erroneous for the poster's purpose (I had a similar issue to the poster). The above approach yields the following:
I am trying to find the first occurrence of heart attack (variable amics) for every patient (NRD_VisitLink or patno).

HTML Code:

sort NRD_VisitLink NRD_DaysToEvent egen patno = group(NRD_VisitLink) label variable patno "Unique Patient Number" by NRD_VisitLink: generate admno = _n label variable admno "Patient's Admission Number" //by above approach by NRD_VisitLink: gen firstamics = sum(amics==1)==1 label variable admno "First AMICS admission"

yields the following:
the problem here is that it can label more than one observation as first occurrence in certain cases.

Now I only wanted to drop the observations for a patient that occurred BEFORE the first amics admission. So I solved this by generating a new variable that labels such "pre-admits" that are to be dropped.

HTML Code:

by NRD_VisitLink: gen preadmit = sum(amics==1)==0

which yields:

which is an appropriate solution of my problem. I can now drop if preadmit=1.

However, I still did not figure out how to specifically answer the posters question. Maybe an argument using a combination of (cumulative sum of amics) and (observation number) can yield a solution. Hopefully above can give some leads to future readers.

Last edited by Mohak Gupta; 15 Sep 2022, 21:42.
Comment
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1409
#5

16 Sep 2022, 00:49

Mohak Gupta you're right that the code in #2 yields multiple observations with the value 1, and so does not uniquely identify the first observation. However, the solution is mentioned in the link provided in #2.

For the original poster, the correct code would be:

Code:

bysort ID (NR) : gen firstge20 = sum(VAR >= 20 & VAR < .) == 1 & sum(VAR[_n-1] >= 20 & VAR[_n-1] < .) == 0

For you similarly, the one-line code would be:

Code:

bysort NRD_VisitLink (NRD_DaysToEvent): gen firstamics = sum(amics==1)==1 & sum(amics[_n-1] == 1) == 0

which avoids the need the create the preadmit variable.

Last edited by Hemanshu Kumar; 16 Sep 2022, 01:31.
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35715
#6

16 Sep 2022, 01:34

I guess I will take the comment "pretty good" as a compliment on my code, but "erroneous" is a strong word.. It's too late to edit a 2015 post but Mohak Gupta is right in #3 so that

When the cumulative sum is first 1, that marks the first occurrence.

would have been better wording. Naturally I agree also with Hemanshu Kumar in #5 that the larger point of #2 was "This is an FAQ" (so please read it!).
Comment

Announcement

Finding the first occurrence of a specific value or greater

Comment

Comment

Comment

Comment

Comment