Better appropriate method for panel data analysis

William Gustafsson

Join Date: Sep 2022

Posts: 4
#1

Better appropriate method for panel data analysis

07 Dec 2023, 08:29

Dear Statalist,

I am creating a model for my undergraduate thesis which I have picked up after a longer pause. I am now stuck in choosing between the FE and RE estimation methods for analysing my panel data.

The data set includes data for 8 variables over 17 years for 47 countries. I have transformed the dependent variable to logaritmic scale to adress issues with heteroskedasticity, and seem to have solved it as per the BP-test (hettest):

. hettest

Breusch–Pagan/Cook–Weisberg test for heteroskedasticity
Assumption: Normal error terms
Variable: Fitted values of log_turistbesök

H0: Constant variance

chi2(1) = 0.11
Prob > chi2 = 0.7355

Further, I have completed a LM-test using xttest0. From my understanding, a rejected null-hypothesis should indicate that POLS is not appropriate. My test results show that we reject H0:

. xttest0

Breusch and Pagan Lagrangian multiplier test for random effects

log_turistbesök[land_id,t] = Xb + u[land_id] + e[land_id,t]

Estimated results:
| Var SD = sqrt(Var)
---------+-----------------------------
log_tur~k | 2.393221 1.547004
e | .2057564 .4536038
u | 1.609991 1.268854

Test: Var(u) = 0
chibar2(01) = 842.29
Prob > chibar2 = 0.0000

From this I have moved on to a Hausman test by using hausman fe re. From my results, I conclude that we do not reject H0, and conclude that the differrence in RE vs FE coefficients do not seem systematic:

.hausman fe re

Test of H0: Difference in coefficients not systematic

chi2(7) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 9.07
Prob > chi2 = 0.2479
(V_b-V_B is not positive definite)

With this information, I have started leaning towards using an RE-estimation method on my data. This would assume that the sample is randomly selected from my population. The data in its original state is not randomly selected, as it includes all avaliable individuals and years. I have therefore tried to create a random sample using sample 50. After the creation of this data set, I have ran the same tests again, and while the stats are different, the conclusions regarding the rejections of H0s remain the same for all tests.

During my courses in statistics, teachers have been clear on the strict conditions of RE estimations, and that the method often is inappropriate to use. This makes me uncertain and unconfortable in my conclusions. Have I taken enough actions to prove that RE is the best fit model for my data, or am I missing some assumption or test that could be crucial for the model-fit? What more can I do to conclude the best fitted method to my analysis?

Thank you to anyone willing to assist in this issue,

WG
Tags: fixed effects, hausman, panel, panel data, random effects
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17708
#2

08 Dec 2023, 01:18

William:
welcome to this forum.
Some comments about your post follow:
1)assuming that you have a continuous regressand, you should start from -xtreg,fe- and, if the evidence of a group-wise effect is detected, you shoud compare it to -xtreg,re-;
2) the difference of the -fe- and -re- estimator is not the randomness of the sample you're dealing with, but the correlation of the ui residual with the vector of predictors, that holds for the -fe- estimator but not for the -re- one, as the latter assumes that both ui and epsilonit are not correlated with the vector of predictors;
3) most of the times, the really restricitive -re- requirements are not satisfied;
4) you can skip the -hausman- nuisance and test if -re- is the way to go via the community-contributed module -xtoverid- (just type -search xtoverid- from within Stata to spot and install it, along with its ancillary modules). Please note that, unlike -hausman-, -xtoverid- dies not alllow -fvvarlist- notation (see -xi:- as a possible fix), while supporting non-default standard errors (that with 47 countries are the way to go in your case).

Kind regards,
Carlo
(Stata 19.0)
Comment
William Gustafsson

Join Date: Sep 2022

Posts: 4
#3

08 Dec 2023, 04:56

Carlo:

Thank you for your assistance! I have some further questions after considering your suggestions:

1) I have now used xttest3 to check for groupwise heteroskedasticity. The results show p = 0.00. My interpretation is that the test indicates groupwise effects, and robust standard errors should be used in my estimation. Is this correct?

2) Having run the xtoverid-test I get p ~ 0.02 which means p < 0.05. If I understand correctly, the xtoverid-test shows that FE after all is the way to go?

3) With xtreg, fe vce(robust), which r^2-value is of most significance? The output shows "within", "between" and "overall" and I am not familiar with how to interprete this.

4) Does the BP-test for heteroskedasticity provide any substansial value to the analysis? I mean, if the BP-test indicates homoskedasticity, but the Wald test shows groupwise heteroskedasticity, doesnt the Wald test nullify the importance of the BP-test?

Thank you again,

WG
(Stata 17.0 SE)
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17708
#4

08 Dec 2023, 06:52

Willian:
1) yes, you're. You can use indifferently the -robust- and the -vce(cluster panelid)- options for non-default standard errors, as, unlike with -regress-, they both calculate cluster-robust standard errors;
2) you're right. You have to switch to -fe-;
3) the R-sq within, as the -fe- estimators focuses on the within-panel variations;
4) no. The community-contributed module - xttest3- is conclusive.

Kind regards,
Carlo
(Stata 19.0)
Comment
William Gustafsson

Join Date: Sep 2022

Posts: 4
#5

08 Dec 2023, 14:34

Carlo:

Having worked with the model during the day. I now feel like that I can use it with more confidence, and my dialouge in the method-segment of my thesis is much better put together. A big thanks!

One more question:

My analysis includes one main explanatory variable to the dependent variable, and six other control variables. In the -fe vce(robust) model, the explanatory variable shows significant, positive and somewhat large effects on the dependent variable, both to the logaritmic transformed dependent variable and the original version. This is in line with many of the previous studies that I have read. The correlation between the dependent variable (not transformed) and the explanatory variable is however low at ~4,6%. Do you have any advice in interpreting this? I have had difficulties finding inspiration on the web and in the litterature. I have tried to exclude outliers, but the correlation remains similar.

Thanks,

WG
(Stata 17.0 SE)
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17708
#6

09 Dec 2023, 02:35

William:
every time we deal with a multiple regression (that is, a regresssion with more than one item in the right-hand side of the regression equation), we have to interpret the contribution to variation in the regressand conditional on the efefct of the remaining predictors/controls.
Threrefore, I would not spend my time in understanding why the correlatiion of your main predictor with the regressand is low.
In addition, regression returns coefficients of the variables included in the right-hand side of the regression equation, and not variables: what does -estat vce, corr- tell you?

Kind regards,
Carlo
(Stata 19.0)
Comment
William Gustafsson

Join Date: Sep 2022

Posts: 4
#7

11 Dec 2023, 13:15

Carlo:

I cannot detect any multicollinearity issues with -estat vce, corr or the vif-test (values between 1,5-4, 1,96 avg). I will proceed with stating the correlation, and not doing too much interpretation as it is part of the thesis instruction to include a correlation matrix.

Thank you for all your help on the matter,

WG
(Stata 17.0 SE)
Comment

Announcement

Better appropriate method for panel data analysis

Comment

Comment

Comment

Comment

Comment

Comment