Unbalanced panel data, which approach?

Kristian Romano

Join Date: Feb 2017

Posts: 7
#1

Unbalanced panel data, which approach?

12 Feb 2017, 11:58

Hello,

I am trying to understand which are the determinants of life expectancy in OECD countries. I have 35 countries and 10 years. The dependent variable are healthcare expenditure, gini coefficient and lifestyle variables such as tobacco consumption. I would like to use panel data with fixed effect both for countries and time. The problem is that my panel is unbalanced. Since I am an undergraduate student I only knew about balanced panel. So I had to investigate.

I would like to use the approach described in 9.4 and 9.4.1 of Baltagi 2005(Baltagi 2005 Econometric Analysis of Panel data,third edition,Wiley), which is to say the ubalanced two way error component model.

Which is the stata command that allows me to do so? And do you think is a correct approach?

Thank you.

Last edited by Kristian Romano; 12 Feb 2017, 12:00.
Tags: Baltagi, OECD, unbalanced panel data
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

12 Feb 2017, 12:06

Hello Kristian.

Welcome to the Stata Forum.

Unfortunately, you didn't present much information on your data, I mean, in terms of Stata's "xtdescribe" and "xtset", for example.

You said: "the problem is that my panel is unbalanced". However, this is not necessarily a problem.

You may wish to apply - tsfill - and -ipolate - to cope with that.

Here, you may wish to read a Forum discussion on a similar matter: http://www.statalist.org/forums/foru...panel-in-stata

Hopefully that helps.

Best regards,

Marcos
1 like
Comment
Kristian Romano

Join Date: Feb 2017

Posts: 7
#3

12 Feb 2017, 12:15

Thank you professor Almeida, I will look at the link you gave me and after I have finished to build my data set I will post the result of xtdescribe

Best regards,

Kristian
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17856
#4

13 Feb 2017, 00:17

Kristian:
as Marcos said, Stata can handle both balanced and unbalanced panel dataset without any problem: hence, this is not the main issue there, whereas implementing a two way error component model probably is (please, see https://www.stata.com/statalist/arch.../msg00829.html).
Besides, you seemingly meant that -healthcare expenditure- and so forth are your independent variables, whereas life expectancy is the dependent one.
Eventually, echoing Marcos's wise advice, you should provide more details on your panel data (-dataex- and CODE delimiters are surely useful in this respect)

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Kristian Romano

Join Date: Feb 2017

Posts: 7
#5

14 Feb 2017, 12:55

xtset B C
panel variable: B (strongly balanced)
time variable: C, 2001 to 2011
delta: 1 unit

xtdescribe

B: 1, 2, ..., 35 n = 35
C: 2001, 2002, ..., 2011 T = 11
Delta(C) = 1 unit
Span(C) = 11 periods
(B*C uniquely identifies each observation)

Distribution of T_i: min 5% 25% 50% 75% 95% max
11 11 11 11 11 11 11

Freq. Percent Cum. | Pattern
---------------------------+-------------
35 100.00 100.00 | 11111111111
---------------------------+-------------
35 100.00 | XXXXXXXXXXX

.
Hope it helps.

Best Regards,
Kristian
Comment
Kristian Romano

Join Date: Feb 2017

Posts: 7
#6

14 Feb 2017, 13:59

However, since I only have 11 years, I can insert 10 dummies and then run a one way error component model. How can I do it since my panel is unbalanced?
Comment
Kristian Romano

Join Date: Feb 2017

Posts: 7
#7

14 Feb 2017, 15:14

obs: 385
vars: 16 14 Feb 2017 22:25
size: 46,970

storage display value
variable name type format label variable label

Country str15 %15s Country
ID byte %10.0g ID country
year int %10.0g year
lifeexp double %10.0g life expectancy
hexp double %10.0g h. expenditure
alc double %10.0g alcohol
fat double %10.0g fat
fruit double %10.0g fruit
gini double %10.0g gini
obs double %10.0g obesity
prot double %10.0g prot
sug double %10.0g sugar
veg double %10.0g vegetables
ins double %10.0g insurance
tob double %10.0g tobacco
gdp double %10.0g gdp

Sorted by: ID year

If you need I can post the whole dataset as attachment.

Best Regards,
Kristian

Last edited by Kristian Romano; 14 Feb 2017, 15:18.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17856
#8

14 Feb 2017, 23:52

Kristian:
as you're seemingly dealing with a large N, small T panel dataset, I think you should go -xtreg- (with -fe-or -re- specification).
Please, give it a try and then post what you typed and what Stata gave you back within CODE delimiters (attachments are usually deperecated; examples/excerpts via -dataex- are highly welcomed).

Kind regards,
Carlo
(Stata 19.0)
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#9

15 Feb 2017, 03:16

According to #5, the data is "strongly balanced".

Best regards,

Marcos
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2624
#10

15 Feb 2017, 04:47

Marcos:
xtset calls a panel "strongly balanced" if there is a row in the data set for each panel-time combination in the range of these two identifiers. It does not check whether any of the other variables in the data set contain missing values, which constitutes an unbalanced panel in the econometric sense.

Kristian:
That said, you can still just use the xtreg command (or almost any other command of interest) in the usual way as already suggested by Carlo. Just add the 10 time dummies and Stata will take care of everything else. You only need to be aware that the underlying econometric assumption is that there is no systematic sample selection.

https://www.kripfganz.de/stata/
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#11

15 Feb 2017, 06:51

Thanks for clarifying it, Sebastian.

With regards to unbalanced panels, I gather all previous comments and advice, including #2 (where -tsfill - and -ipolate - are mentioned), are still valid.

Last edited by Marcos Almeida; 15 Feb 2017, 06:53.

Best regards,

Marcos
Comment
Kristian Romano

Join Date: Feb 2017

Posts: 7
#12

15 Feb 2017, 16:53

Attached you find the results of des xtreg and codebook. I was not able to post directly here.

Best regards,

Kristian
Attached Files

lifeexp.pdf (336.5 KB, 1 view)
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17856
#13

16 Feb 2017, 00:44

Kristian:
-as per FAQ, attachments are deprecated, whereas excerpts/examples of your dataset via -dataex- (please, do not give in so quickly; type -search dataex- to install it before using it);
- the best way to post what you typed and what Stata gave you back is via CODE delimiters;
You have a sky-rocketing number of predictors vs you sample size: yo have to go for a more parsimonious model: if you can't increase your sample size, you can plug in no more than 3-4 predictors.
As per your results, you probably have an underlying multicollinearity issue.
As a closing-out remark, please note that it is risky to post your Stata serial licence number and it is absolutely immaterial for getting helpful replies on this list.

Kind regards,
Carlo
(Stata 19.0)
Comment
Kristian Romano

Join Date: Feb 2017

Posts: 7
#14

16 Feb 2017, 10:03

Dear professor Lazzaro,

I am sorry, I will delete the previous post.

Best
Kristian
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17856
#15

16 Feb 2017, 11:18

Kristian:
you can delete your thread within 1 hour from posting it.
However, there's no need to delete it; it suffices reading the FAQ before your next post.
As a closing-out remark, please call me Carlo, just like all on the list (and even more off the list) do.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

Unbalanced panel data, which approach?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment