Identifying first occurence in unbalanced panel data set

Mads Moring

Join Date: Apr 2017

Posts: 44
#1

Identifying first occurence in unbalanced panel data set

08 May 2018, 09:23

Hi everyone

I have a problem concerning identifying the first time a respondent occurs in my data. The reason being that I only wish to continue with the first occurrence for each individual in my further analysis.

For the purpose I have a year variable, an identifier and a happening

An individual can figure multiple times in the same year. The identifier is just a unique number. Lastly, the happening is a binary variable [0;1], but I’m only interested in the respondents that have 1 in this case.

I haven't quite been able to find something that does the trick in past threads.

Here's an example of how my data looks.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input float(year identifier occurence) 2011 111 1 2012 111 1 2013 111 1 2010 211 1 2015 211 1 2009 311 1 2011 311 1 2011 311 1 2014 311 1 2009 411 1 2010 411 1 2011 411 1 2012 411 1 2013 411 1 2015 511 1 2016 511 1 2001 611 1 2011 611 1 2000 711 1 2001 811 1 2001 811 1 2008 811 1 2011 911 1 2012 911 1 2013 911 1

I've tried

Code:

bysort identifier year: gen counter=_N

but that just gives me a variables containing only 1's.

What I'm looking for is a counting variable that tells me the nth time this identifyer is occuring. From there I could go on and only keep if counter==1

The end result could look like this:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input float(year identifyer occurence counter) 2011 111 1 1 2012 111 1 2 2013 111 1 3 2010 211 1 1 2015 211 1 2 2009 311 1 1 2011 311 1 2 2011 311 1 3 2014 311 1 4 2009 411 1 1 2010 411 1 2 2011 411 1 3 2012 411 1 4 2013 411 1 5 2015 511 1 1 2016 511 1 2 2001 611 1 1 2011 611 1 2 2000 711 1 1 2001 811 1 1 2001 811 1 2 2008 811 1 3 2011 911 1 1 2012 911 1 2 2013 911 1 3 end

So, in short: How do I construct a "countingvariable" that tells me what time it is that the specific person/identifier occurs? It is importance to note here, that the "first" time is thought of as chronological, so the earliest observation you might say. That's why I have tried to sort by year.

Kind regards
Mads
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17702
#2

08 May 2018, 09:54

Mads:
I would try:

Code:

bysort identifier (year) : gen flag=sum( occurence)

Kind regards,
Carlo
(Stata 19.0)
2 likes
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35642
#3

08 May 2018, 10:08

You don't say what you want to happen when occurrence is 0 not 1. If Carlo's code is fine, that's fine; otherwise it may be that you want

Code:

bysort identifier (year) : gen flag= cond(occurrence == 0, 0, sum(occurrence))

or

Code:

bysort identifier (year) : gen flag= (occurrence == 0) * sum(occurrence))

either of which maps 0s to 0 but counts the 1s.

NB This is an FAQ https://www.stata.com/support/faqs/d...t-occurrences/
2 likes
Comment
Mads Moring

Join Date: Apr 2017

Posts: 44
#4

08 May 2018, 11:00

Hi Nick

Thank you for you answer.

I'm unsure about what you syntax' provides, and what the individual parts does. In any case, they dont give me the same answer.

You could say that my data is devided between ppl. who had an occurencce happening to them, and ppl. that didn't. The ppl. that did can appear more than once. I want a variable, that tells me which of the times are the (chronologically) first.

When occurencce is 0, and not 1, I dont really want anything to happen - Im not interested in them. In the end I will compare the ones who didn't have an occurence with ppl. that did have an occurence, but only their first time.

I have a feeling that the codes you provided are close, but when inspecting the results, it seems that it sometimes restart the count for the same person.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35642

08 May 2018, 11:15

Sorry, there are two typos in my code. This should make it clearer:

Code:

clear
input float(year identifier occurrence)
2010 211 1
2011 211 0 
2012 211 0 
2013 211 0 
2014 211 0 
2015 211 1
end 

bysort identifier (year) : gen flag1 = cond(occurrence == 0, 0, sum(occurrence))

bysort identifier (year) : gen flag2 = (occurrence == 1) * sum(occurrence)

list, sep(0) 

     +--------------------------------------------+
     | year   identi~r   occurr~e   flag1   flag2 |
     |--------------------------------------------|
  1. | 2010        211          1       1       1 |
  2. | 2011        211          0       0       0 |
  3. | 2012        211          0       0       0 |
  4. | 2013        211          0       0       0 |
  5. | 2014        211          0       0       0 |
  6. | 2015        211          1       2       2 |
     +--------------------------------------------+

Comment

Mads Moring

Join Date: Apr 2017

Posts: 44
#6

08 May 2018, 11:37

Hi Nick

They now provide me with the same answers, and I feel confident that I have what I'm looking for.
Thank you!

Kind regards
Mads
Comment

Announcement

Identifying first occurence in unbalanced panel data set

Comment

Comment

Comment

Comment

Comment