Keeping variables in panel

Federico Nutarelli

Join Date: Sep 2018
Posts: 430

Keeping variables in panel

22 Jan 2019, 10:54

Hi all,

I have an unbalanced panel. The complete panel spans 11 periods from 2004 to 2015. I would like to keep just the "balanced" firms having observations from 2004 to 2015. The unbalanced panel looks like:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double idfirm float Year
808 2004
808 2004
808 2004
808 2004
808 2005
808 2005
808 2005
808 2005
808 2006
808 2006
808 2006
808 2006
808 2007
808 2007
808 2007
808 2007
808 2008
808 2008
808 2008
808 2008
808 2009
808 2009
808 2009
808 2009
808 2010
808 2010
808 2010
808 2010
808 2011
808 2011
808 2011
808 2011
808 2012
808 2012
808 2012
808 2012
808 2013
808 2013
808 2013
808 2013
808 2014
808 2014
808 2014
808 2014
808 2015
808 2015
808 2015
808 2015
808 2004
808 2004
808 2004
808 2004
808 2005
808 2005
808 2005
808 2005
808 2006
808 2006
808 2006
808 2006
808 2007
808 2007
808 2007
808 2007
808 2008
808 2008
808 2008
808 2008
808 2009
808 2009
808 2009
808 2009
808 2010
808 2010
808 2010
808 2010
808 2011
808 2011
808 2011
808 2011
808 2012
808 2012
808 2012
808 2012
808 2013
808 2013
808 2013
808 2013
808 2014
808 2014
808 2014
808 2014
808 2015
808 2015
808 2015
808 2015
808 2004
808 2004
808 2004
808 2004
end

I have employed the following code:[CODE
]tempvar q
bysort idfirm Year: gen `q' = _n
gen byte balanced = 0
replace balanced = 1 if `q'==11

drop if balanced==0[/CODE]
do you think it is fine?

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30089
#2

22 Jan 2019, 11:18

What you show is not panel data in the ordinary sense. You have multiple observations of the same firm in the same year. It is difficult to know what "balanced" even means in this context. Please clarify.
Comment
David Benson

Join Date: Oct 2018

Posts: 489
#3

22 Jan 2019, 12:03

Federico,

It looks like the same firm-year combination is in your data many times (perhaps because each observation is actually a firm-year-product combination?). To get firms that are present in the data each year from 2004 to 2015, I would create a new dataset that collapses down to 1 observation per firm-year.

So you could do:

Code:

duplicates drop idfirm Year

Code:

dataex firm year // data shared via -dataex-. To install: ssc install dataex clear input byte firm int year 1 2004 1 2005 1 2006 1 2007 1 2008 1 2009 2 2005 2 2006 2 2007 2 2008 2 2009 3 2004 3 2005 3 2006 3 2007 3 2008 4 2004 4 2005 4 2006 4 2007 4 2008 4 2009 end * Above data is only for 2004-2009 bysort firm (year): gen count = _N . list, sepby(firm) noobs +---------------------+ | firm year count | |---------------------| | 1 2004 6 | | 1 2005 6 | | 1 2006 6 | | 1 2007 6 | | 1 2008 6 | | 1 2009 6 | |---------------------| | 2 2005 5 | | 2 2006 5 | | 2 2007 5 | | 2 2008 5 | | 2 2009 5 | |---------------------| | 3 2004 5 | | 3 2005 5 | | 3 2006 5 | | 3 2007 5 | | 3 2008 5 | |---------------------| | 4 2004 6 | | 4 2005 6 | | 4 2006 6 | | 4 2007 6 | | 4 2008 6 | | 4 2009 6 | +---------------------+ * Then do keep if count==11 in your case gen to_keep = 1 // this will now be==1 for all firms you will be keeping

Once you've created to_keep==1 for all the firms that are continuously in your data, you could then collapse it down to a single obs per firm, or just merge it into your data in #1 based on firm year. The variable to_keep will equal 1 for all firms that are continuously in your data from 2004-2015 and will be missing for all others. You can then do "keep if to_keep==1."

As an aside, I don't think there is a need to delete "unbalanced" data, and in fact, I would be concerned about survival bias for including only the firms that are continuously in your data (do they differ in important ways from firms that entered and went bankrupt or otherwise exited the industry?) Also, Stata handles such "unbalanced" data just fine. If you are getting pressure from your advisor to do it that way that's one thing, but if the choice is up to you I would leave those observations in. (Run your regressions with and without them and see what changes).

Last edited by David Benson; 22 Jan 2019, 12:11.
1 like
Comment

Federico Nutarelli

Join Date: Sep 2018
Posts: 430

23 Jan 2019, 01:08

You are both right. Indeed I posted the wrong data. The right ones are the following, I am again sorry for the confusion:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double(idfirm Year)
 1 2008
 1 2009
 1 2010
 1 2011
 1 2012
 1 2013
 1 2014
 1 2015
 2 2004
 2 2005
 2 2006
 2 2007
 2 2008
 2 2009
 2 2010
 2 2011
 2 2012
 2 2013
 2 2014
 2 2015
 3 2004
 3 2005
 3 2006
 3 2007
 3 2008
 3 2009
 3 2010
 3 2011
 3 2012
 3 2013
 3 2014
 3 2015
 4 2004
 4 2005
 4 2006
 4 2007
 4 2008
 4 2009
 4 2010
 4 2011
 4 2012
 4 2013
 4 2014
 4 2015
 5 2007
 6 2004
 6 2005
 6 2006
 6 2007
 6 2008
 6 2009
 7 2007
 7 2008
 7 2009
 7 2010
 7 2011
 7 2012
 7 2013
 7 2014
 7 2015
 8 2004
 8 2005
 8 2006
 8 2007
 8 2008
 8 2009
 8 2010
 8 2011
 8 2012
 8 2013
 8 2014
 8 2015
 9 2004
 9 2005
 9 2006
 9 2007
 9 2008
 9 2009
 9 2010
 9 2011
 9 2012
 9 2013
 9 2014
 9 2015
10 2004
11 2008
11 2009
11 2010
11 2011
11 2012
11 2013
11 2014
12 2012
12 2013
12 2014
12 2015
13 2004
13 2005
13 2006
13 2007
end

The complete panel represents a firm observed for 11 years from 2004 to 2015. However, being the panel unbalanced, some firms may be observed for less years (either they are observed from a Year later than 2004 or they "die" before 2015). My aim is to keep just those firms that have a complete panel, that is those firms that are observed from 2004 to 2015.

Thank you for both replies and sorry for the mistake!

Last edited by Federico Nutarelli; 23 Jan 2019, 01:11.

Comment

Leon Nguyen

Join Date: Dec 2018
Posts: 26

23 Jan 2019, 01:22

Hello Federico,

As I understand, you want to create a balance panel data. For that, I would recommend this code:

Code:

duplicates report idfirm // this command just for checking
duplicates tag idfirm, gen(dup) // create a variable "dup' to count how many time that firm repeat
drop if dup!=11 // drop all firm that not repeats 11 years
drop dup

Hope it help!

Comment

Leon Nguyen

Join Date: Dec 2018

Posts: 26
#6

23 Jan 2019, 03:44

Federico Nutarelli
Sorry, my bad. just change command

Code:

drop if dup!=11

by command

Code:

drop if dup!=10

because previous command count how many duplicates, and duplicates==10 mean that firm repeats 11 years
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30089
#7

23 Jan 2019, 09:32

Let me just point out that if the complete data is supposed to run 2004 through 2015 (including both of those years), then the number of observations to target is not 11, it is 12. (corresponding to dup != 11).
1 like
Comment
Federico Nutarelli

Join Date: Sep 2018

Posts: 430
#8

25 Jan 2019, 03:03

Perfect. Sorry again for my mistake. Thank you very much for the appreciated help!
Comment

Announcement

Keeping variables in panel

Comment

Comment

Comment

Comment

Comment

Comment

Comment