Unbalanced Data

Lana Rais

Join Date: Aug 2015

Posts: 17
#1

Unbalanced Data

10 Sep 2015, 03:40

Hi everybody,

I need to do POLS and FE analyses with highly unbalanced panel data: 60 countries over 35 years, there is no a single country with full range of observations and even more, observations on the index of income inequality are too few by each country.

My two questions are the following:

(1) Is it correct if for my unbalanced data I will use standard commands for panel data regressions, such as
xtreg ... ... ... ... , robust
xtreg ... ... ... ... , FE robust

Is xt-commands enough for Stata to handle data with many missing observations?

(2) I need to test SE for heteroskedasticity and for FE model I have found a written test "xttest3" to check heteroscedasticity, but I haven't found the similar test for POLS model. What concerns autocorrelation, I don't know how to check it at all ... maybe I need to correct for autocorrelation too and use cluster (id) instead of "robust"? Would you plese tell me how I can test both models in order to decide on clustering / correcting for heteroskedasticity.

If needed, dataset is attached

Attached Files

Panel data over 1980-2014.dta (136.3 KB, 1 view)

Last edited by Lana Rais; 10 Sep 2015, 03:56.
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17704
#2

10 Sep 2015, 23:46

Lana:
your post raised different issues:
- Stata can handle unbalanced panel data analysis effectively under all the specifications;
- if you are intended to replace missing observations, you may want to consider -help ipolate- or -help mi- and related entries in Stata .pdf manual;
- -xt- is the prefix of a suite of commands that deal with different type of panel data analyses; hence, sticking with your question, -xt- is not enough to carry out what you're after, while -xtreg- does;
- under -xtreg- vce(robust) and vce(cluster) are interchangeable;
- a POLS implies autocorrelation, (so there's no need to test it), since you have multiple observations on the same units. However, in this instance you should go -vce(cluster)-, as vce(robust) takes heteroskedasticity only into account.
As a closing-out remark, plese note that -FE- in your second code should be -fe-, as Stata commands are case-sensitive (by the way, I assume that you have already checked via -hausman- what is the best speciication for your panel data regression, i.e.: -fe- or -re-).

Kind regards,
Carlo
(Stata 19.0)
Comment
Lana Rais

Join Date: Aug 2015

Posts: 17
#3

11 Sep 2015, 02:35

Thank you Carlo!
Comment
Lana Rais

Join Date: Aug 2015

Posts: 17
#4

11 Sep 2015, 02:41

I have one more question with regard to my dataset.

In Stata, can I drop out some ids (countries, in my case) from my dataset to create a smaller subset for some regressions and afterall return dropped coutries back to the initial dataset?

For the momet I have 3 data sets with different country groups (full dataset, developed and developing countries) but maybe there is a code in Stata which allows to have only one full dataset but transform it into smaller subsets when needed?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17704
#5

11 Sep 2015, 03:11

Lana:
in my opinion, the the best approach for dealing with this kind of issue is not -drop- the undesired observations but tag them with a dummy, something along the lines of (assuming your -country- variable is in numeric format):

Code:

gen undesired_countries=1 if country==<whatisneeded>/// 1 for undesired countries; repeat replacing -gen- with -replace- for each county you want to tag as undesired replace undesired_countries=0 if country==.

If -country- variable is in -string- format, tweak what above as follows:

Code:

gen undesired_countries=1 if country=="whatisneeded"/// 1 for undesired countries; repeat replacing -gen- with -replace- for each county you want to tag as undesired replace undesired_countries=0 if country==""

Kind regards,
Carlo
(Stata 19.0)
Comment
Lana Rais

Join Date: Aug 2015

Posts: 17
#6

11 Sep 2015, 05:29

Ah yes, I haven't thought about dummy for this case!

I use a few regional dummies in my analysis just to check whether there are differencies between Latin- African- and Asian world I have to think about this. Thanks

Carlo, whould you be so nice to share your opinion on one more issue, it would help me to understand the problem of autocorrelation.

In my data I assume cross-sectional autocorrelation, because economies are open and capital is free to move between coutnries. On top of that there must be time series correlation.

Analyzing my panel data with OLS I have computed full time-period averages for cross-sectional variables (arithmetic averages for all variables except for growth, geometric one for growth), and regressing average indicator on the set of other averages I want to get rid of time series autocorrelation. I hope I will.

But will averraging eliminate as well cross-sectional autocorrelation?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17704
#7

11 Sep 2015, 06:24

Lana:
I would take a step back.
Why are you considering a POLS? is it because the F-test at the foot of the -xtreg,fe- table outcome turned out unsignificant?
If it is not the case, I woud get rid of POLS and focus on -xtreg-, instead, adding a vce(cluster) or vce(robust) option.

Kind regards,
Carlo
(Stata 19.0)
Comment
Alfonso Sánchez-Peñalver

Join Date: Mar 2014

Posts: 432
#8

11 Sep 2015, 07:18

Hi, I think Carlo's comments are right to the point and great. I would also ask why you're not considering random effects as well. The problem that I have with fixed effects is that it doesn't allow the inclusion of variables that only vary across panels but not within panels. You can always test which is the more appropriate estimation method with a hausman command.

Whether to include POLS or not that is always a question of personal choice. Even though the F-test that Carlo mentions may indicate that FE is the right estimation method, I find that sometimes it's interesting to see how similar the coefficients on each variable are or not across estimation methods.

Alfonso Sanchez-Penalver
1 like
Comment
Lana Rais

Join Date: Aug 2015

Posts: 17
#9

11 Sep 2015, 08:51

Alfonso and Carlo, thank you for your answers.
I need to apply POLS and fixed effects methods to my panel data and I need to conduct OLS regression with cross sectional averages just because I want to replicate one study which was peformed many years ago. But I want to do it correctly.

In order to do cross sectional analysis with OLS regression I average all the indicators in my panel data and with this averaging I suppose I will get rid of autocorrelation automaticly but I am not sure. Maybe in this regression I need to control for autocorrelation?

Last edited by Lana Rais; 11 Sep 2015, 09:03.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17704
#10

11 Sep 2015, 09:00

Lana:
as far as I can get your OLS approach, you should en up with single average values for each panel unit across time-periods; hence, I do not think that autocorrelation could be an issue, in that you don't have multiple observations for each panel unit anymore.

Kind regards,
Carlo
(Stata 19.0)
Comment
Lana Rais

Join Date: Aug 2015

Posts: 17
#11

11 Sep 2015, 09:08

ok, many thanks
Comment

Alfonso Sánchez-Peñalver

Join Date: Mar 2014
Posts: 432

#12

11 Sep 2015, 11:04

Hi, my first thought is that the OLS of the across panels' averages is exactly the between estimator you have in xtreg, be. Here's a simple code to make you realize that:

Code:

clear all
set more off

webuse nlswork, clear
global x "age race collgrad grade hours"
keep idcode year ln_wage $x

xtset idcode year

** Between estimator
* xtreg be
xtreg ln_wage $x, be

* regress estimation
* collapse calculates the means of the variables you include in the varlist
* the by option tells collapse to the group identifier you want for the means
* cw option takes care of missing values in certain panels
collapse ln_wage $x, by(idcode) cw
regress ln_wage $x

I hope this helps making your life easier.

Alfonso Sanchez-Penalver

Comment

Lana Rais

Join Date: Aug 2015

Posts: 17
#13

11 Sep 2015, 11:53

Alfonso thanks, it is very nice of you! I need not just an arithmetic averages but a geometric average for one of the indicators. I have already done the code, but I will check now whether I can do it shorter or simpler.

Last edited by Lana Rais; 11 Sep 2015, 12:18.
Comment

Alfonso Sánchez-Peñalver

Join Date: Mar 2014
Posts: 432

#14

11 Sep 2015, 17:38

Do you know about egenmore (available in SSC)? It basically has more functions for the egen command. If you download and install it then you can create a new variable that has the geometric mean of whatever variable you want by group. The following code extends the one I sent before calculating the geometric mean of age and doing the between estimation with xtreg, be and with regress like before.

Code:

clear all
set more off

webuse nlswork, clear
global x "age race collgrad grade hours"
keep idcode year ln_wage $x

xtset idcode year

** Between estimator
* xtreg be
xtreg ln_wage $x, be

preserve

* regress estimation
* collapse calculates the means of the variables you include in the varlist
* the by option tells collapse to the group identifier you want for the means
* cw option takes care of missing values in certain panels
collapse ln_wage $x, by(idcode) cw
regress ln_wage $x

restore, preserve

egen gage = gmean(age), by(idcode)
global x "gage race collgrad grade hours"
xtreg ln_wage $x, be

collapse ln_wage $x, by(idcode) cw
regress ln_wage $x

Alfonso Sanchez-Penalver

Comment

Lana Rais

Join Date: Aug 2015

Posts: 17
#15

12 Sep 2015, 04:30

Many thanks. Afterwards, can I get back the initial content of my dataset? I mean is there any command in Stata for it?

I want to do first OLS Regression with averages but then POLS and FE with full range of panel data.

Last edited by Lana Rais; 12 Sep 2015, 04:35.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment