unbalanced panel - missing

Mina Wu

Join Date: Jul 2015

Posts: 79
#1

unbalanced panel - missing

16 Apr 2017, 08:48

Hi! everyone

I have an unbalanced panel for 2006-2014 period, implying that observations are missing for some years. Hypothetically, the following is an example of observation with ID "15", which has missings for some years (2006-2009)

year ID X
2010 15 .0029651
2011 15 .0021118
2012 15 .0011135
2013 15 .0022467
2014 15 .0022368

Is there any way to convert this dataset in a way that it shows missing values for 2006-2009 instead of the empty cells (below)

Code:

year ID X 2006 15 . 2007 15 . 2009 15 . 2010 15 .0029651 2011 15 .0021118 2012 15 .0011135 2013 15 .0022467 2014 15 .0022368

Thanks for any tips!!

Mina
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#2

16 Apr 2017, 09:19

Mina:
you can recode -.- as an extended missing -.a- or "implausible" customary numerical value, say, 9999. (see -help missing- for further details).

Code:

replace X=.a if X==.

Code:

replace X=9999 if X==.

In the latter case, you shoud remember to exclude them from your analysis via an -if- qualifier:

Code:

tabstat X if X!=9999, stat(count mean sd p50 min max)

Kind regards,
Carlo
(Stata 19.0)
Comment
Mina Wu

Join Date: Jul 2015

Posts: 79
#3

17 Apr 2017, 09:50

Dear Carlo

Thank you for your prompt response!

I have tried both options and got the following message

(0 real changes made)

Basically, nothing happens
in my example, observations 2006-2009 for variable X are missing in the data (they do not exist).I would like them to be missing observations with value "." or even "0". Any of the two is good, as long as that takes me to "balanced panel".

Thank you for any further help!
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17711

17 Apr 2017, 10:45

Mina:
I cannot replicate your problem:

Code:

. input year    ID     X

          year         ID          X
  1.
. 2006    15     .
  2.
. 2007    15     .
  3.
. 2009    15     .
  4.
. 2010    15    .0029651
  5.
. 2011    15    .0021118
  6.
. 2012    15    .0011135
  7.
. 2013    15    .0022467
  8.
. 2014    15    .0022368
  9. end

. replace X=9999 if X==.
(3 real changes made)

. list

     +----------------------+
     | year   ID          X |
     |----------------------|
  1. | 2006   15       9999 |
  2. | 2007   15       9999 |
  3. | 2009   15       9999 |
  4. | 2010   15   .0029651 |
  5. | 2011   15   .0021118 |
     |----------------------|
  6. | 2012   15   .0011135 |
  7. | 2013   15   .0022467 |
  8. | 2014   15   .0022368 |
     +----------------------+

That said, please note that Stata can handle both balanced and unbalanced panel datasets without any problem.

Kind regards,
Carlo
(Stata 19.0)

Comment

Mina Wu

Join Date: Jul 2015
Posts: 79

17 Apr 2017, 11:04

Dear Carlo

That is the crux of the issue. 2006-2009 are missing in the data. When I browse my data, there are no observations with ".".

my data looks like this

Code:

  
     +----------------------+
     | year   ID          X |
     |----------------------|
  4. | 2010   15   .0029651 |
  5. | 2011   15   .0021118 |
     |----------------------|
  6. | 2012   15   .0011135 |
  7. | 2013   15   .0022467 |
  8. | 2014   15   .0022368 |
     +----------------------+

Nothing appears by the year 2006-2009. But i would like to have this format:

Code:

     +----------------------+
     | year   ID          X |
     |----------------------|
  1. | 2006   15       .    |
  2. | 2007   15       .    |
  3. | 2009   15       .    |
  4. | 2010   15   .0029651 |
  5. | 2011   15   .0021118 |
     |----------------------|
  6. | 2012   15   .0011135 |
  7. | 2013   15   .0022467 |
  8. | 2014   15   .0022368 |
     +----------------------+

Or, let me put it differently.In 2007, these firms have values X.

Code:

input float ID int year float
 1 2007          0
 9 2007          0
11 2007          0.1
12 2007          0
13 2007          .2
14 2007          0
15 2007          0

What I want is to see is ALL firms in 2007 (those with ID 2-8 and ID-10, as well!) in the data, but with missing value for variable X.

Code:

input float ID int year float
 1 2007          0
2 2007           .
3 2007           .
4 2007           .
5 2007           .
6 2007          .
7 2007          .
8 2007          .
 9 2007          0
10 2007         .
11 2007          0.1
12 2007          0
13 2007          0.2
14 2007          0
15 2007          0

Sorry if I caused confusion and thank you so much for any help!

Last edited by Mina Wu; 17 Apr 2017, 11:17.

Comment

Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#6

17 Apr 2017, 12:22

You can use fillin to balance the sample. However, there are only a few Stata routines that demand balanced data. I suspect most if not all such techniques require you have data on all the observations considered to be balanced.

Arbitrarily filling in with 0 will almost certainly mess up your estimation - you're arbitrarily adding a pile of data errors.
Comment
Mina Wu

Join Date: Jul 2015

Posts: 79
#7

17 Apr 2017, 12:47

I completely agree with you. I am not using this data for estimation, but to copy-past it in my excel for other analysis.
Do you know how dataset "filling" works?

Thanks!!
Comment

Announcement