Runs of consecutive observations in a panel

Tochi Rose

Join Date: Apr 2020

Posts: 8
#1

Runs of consecutive observations in a panel

22 Dec 2020, 16:05

Greetings,

I was wondering if someone could help me with this issue. I'm working with a compustat data set that goes from 1984Q1-2014Q4 .
I have a panel with firm id and date (year-quarter) and I need to identify spells of consecutive observations for each firm. I was thinking in doing the following: first create a variable (call it time) that is a one-to-one mapping with the date. For example, if year=1984 and quarter=1, then time=1...... if year=1985 and quarter=2, then time=6........ if year=2014 and quarter=4 then time=124.

then i could do

tsset id time
gen run = .
by id: replace run = cond(L.run == ., 1, L.run + 1)
by id: egen maxrun = max(run)

any suggestion on how to build this variable time? or maybe there's a much easier way to achieve this? My goal is to identify for each firm, the longest spell of consecutive observations and drop all the others.

thank you

Tomás
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30091
#2

22 Dec 2020, 16:49

Your approach is unnecessarily complicated. All you need is:

Code:

by id, sort: egen maxrun = _N

to get the same result as the code you how would give you.

By the way, if you are going to work with panel data, you should get your date variable into proper Stata internal format. It seems you have two variables, year and quarter. I assume both are numeric. Then what you need is:

Code:

gen qdate = yq(year, quarter) format qdate %tq

and that date can now be used in -xtset- or -tsset-, and can be used to sort chronologically or calculate intervals between timepoints, or anything else you can do with dates. The separate year and quarter variables occasionally prove useful for special purposes. But in general they are not necessary and can be dropped. If it turns out later you actually need them separately, you can always recover them with -gen year = year(dofq(qdate))- and -gen quarter = quarter(dofq(qdate))-.

Do read -help datetime-, it's a lot to learn but well worth the effort.
Comment
Tochi Rose

Join Date: Apr 2020

Posts: 8
#3

22 Dec 2020, 17:11

thank you clyde for replying. I'm having some trouble with this approach though. I do have the date variable as you suggest and i set the panel using: tsset id qdate
however, when I run: by id, sort: egen maxrun = _N

it returns the following error: unknown egen function _N()

any clue why that's happening?

Thanks
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30091
#4

22 Dec 2020, 17:25

Sorry, that should be -gen-, not -egen-.
Comment
Tochi Rose

Join Date: Apr 2020

Posts: 8
#5

22 Dec 2020, 17:47

So I tried two things:

by id, sort: egen maxrun = _N

and

gen run = .
by id: replace run = cond(L.run == ., 1, L.run + 1)
by id: egen maxrun_bis = max(run)

Suppose I have a firm with 58 observations. They are all consecutive in the panel, so under the first case, maxrun=58. But what I really care about is for the missing quarters (if any). This firm has the first 9 observations consecutive (from 1984q4 until 1986q4) but then it jumps directly to 1987q4. So now I start counting again and the next spell goes from 1987q4-1993q4. So here I have a spell of length=25. After 1993q4 it jumps again to 1994q2 so I start counting again. The last spell goes from 1994q2-2000q1 so that's a spell of length=24

So I have 58 observations for this firm split in three spells: one with 9 obs, one with 25 and one with 24.
how can I keep only the firm's observations corresponding to the second (largest) spell??
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30091
#6

22 Dec 2020, 18:16

So this is quite different from what you asked for in #1. And neither type of code in #5 is going to get it for you. What you need to get the spells with no gaps of the quarterly date is:

Code:

by id (qdate), sort: gen spell = sum(qdate != qdate[_n-1]+1) by id spell (qdate), sort: gen spell_size = _N

Then to find the second largest:

Code:

by id (spell), sort: gen n_spells = spell[_N] by id (spell_size spell), sort: gen rank_order = n_spells + 1 - sum(spell != spell[_n-1])

And finally you can keep only the second largest with

Code:

by id: keep if rank_order == 2

Warning: the term second largest is inherently ill-defined. There could be two or more spells with the same size that are tied for second rank in size. The above code will pick the one that is chronologically latest among those.

Note: this code is getting more complicated as we go. If you have any difficulties with this and need further assistance, you must show example data to work with, and you must use the -dataex- command to do that. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
1 like
Comment
Tochi Rose

Join Date: Apr 2020

Posts: 8
#7

22 Dec 2020, 18:47

Thank you Clyde. This last code worked beautifuly. I see now i made it kinda confusing with my example. I actually want to keep THE largest spell for each firm, not the second largest. In my example above, the largest one is the one in the middle (the second one). Sorry about that. Anyway, in the code you wrote above, I only need to modify the rank_order from 2 to 1 and that should do the trick.

I hadn't considered the possibility of having a firm featuring two different spells with the same number of observations.For instance, If i have a firm with three spells of observations: the first two with 25 obs each and the last one with 10 obs. Then, which one should we pick? you're right, that could be problematic
(I'm trying to replicate a paper. Maybe i should contact the authors here to see what exactly did they do with missing quarters. Their appendix is not explicit about this).

I'll make sure to get dataex (my stata is 13) for future posts that are too complicated to show without example data

thanks again
Comment

Announcement

Runs of consecutive observations in a panel

Comment

Comment

Comment

Comment

Comment

Comment