Generating data to balance a panel dataset

Ali Uppal

Join Date: Nov 2018
Posts: 1

Generating data to balance a panel dataset

29 Nov 2018, 08:45

Hi,

I have a panel from Compustat like the below in Table 1 (table 1 is a subset of the data to show 3 different example issues), where gvkey is the firm-specific identifier, fyear is the reporting year, emp is employment, and dlrsn is the reason the firm dropped out of the dataset.

Table 1

gvkey	fyear	emp	dlrsn
001	1996	2	02
001	1997	3	02
001	1998	2	02
001	1999	1	02
002	1996	4	06
002	1997	5	06
002	1998	4	06
002	1999	3	06
002	2000	3	06
002	2001	3	06
002	2002	3	06
002	2003	3	06
003	1996	7	.
003	1997	8	.
.	.	.	.
.	.	.	.
.	.	.	.
003	2016	14	.

I need employment data for each firm all the way up to and including 2016 (as shown in Table 2). However, many firms drop out of the dataset (e.g., because of bankruptcy). For such firms, I want to generate employment numbers for all years from the last date they reported, going up to and including 2016 using the following methodoloy:

If dlrsn is 02 or 03, set employment number to zero from the first year after the last reporting year going up to and including 2016. For example, in Table 2, firm 001 reports up to 1999, I would like to generate data that has fyears 2000-2016 and employment set at 0 because dlrsn is 02.
If dlrsn is 01,04,05,06,07,09,10,20, use the last reported employment number for all years after the last reported year. For example, in Table 2, firm 002 reports up 2003, I would like to generate data that has fyear 2004-2016 and employment is set equal to the last available employment number (i.e.,4) because dlrsn is 06
If the firm does not drop out of the dataset, nothing should change.

Table 2

gvkey	fyear	emp	dlrsn
001	1996	2	02
001	1997	3	02
001	1998	2	02
001	1999	1	02
001*	2000*	0*	02*
001*	2001*	0*	02*
001*	2002*	0*	02*
.	.	.	.
.	.	.	.
.	.	.	.
001	2016	0*	02
002	1996	4	06
002	1997	5	06
002	1998	4	06
002	1999	3	06
002	2000	3	06
002	2001	2	06
002	2002	3	06
002	2003	4	06
002*	2004*	4*	06*
002*	2005*	4*	06*
.	.	.	.
.	.	.	.
.	.	.	.
002*	2016*	4*	06*
003	1996	7	.
003	1997	8	.
.	.	.	.
.	.	.	.
.	.	.	.
003	2016	14	.

Essentially, I want to get from Table 1 to Table 2 and would very much appreciate any advice (note there are thousands of firms where different dates and the above is just a example). I tried the following code but got a bit stuck:

gen dldteyear=year(dldte)
bysort gvkey: egen lastdate=max(fyear)
expand yeardiff if fyear==lastdate

The code was able to duplicate the last reported date in Table 1 the correct number of times but then I was a bit stuck with how to do the next step, I was thinking replace years because they would have to be made consecutive and then replace employment but this started to get a bit messy. I am sure there is probably a better approach then the one I am taking which seems rather mechanical.

Thanks in advance for all support.

Best,
Ali

Tags: compustat, generate data, panel, unbalance panel

Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

29 Nov 2018, 09:40

Welcome to the Stata Forum / Statalist.

Please read the FAQ, particularly the topic about sharing command/data/output.

I didn't read the post in detail and I didn't get the reason to produce a balanced data. But I gather you may use - fillin - for the matter:

Code:

help fillin

Best regards,

Marcos
Comment
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#3

30 Nov 2018, 10:11

Let me note that filling in observations for firms that no longer exist is not likely to be a good idea. It is one thing to have a missing observation, but when the true value truly should not exist (e.g., firm has disappeared), then filling in by any arbitrary rule is questionable. Most Stata panel estimators do not require balanced panels.
Comment

Announcement

Generating data to balance a panel dataset

Comment

Comment