Comparing results obtained by FE, RE, and OLS regression models when usind different time periods and dependent variables

Kara Rashid

Join Date: Aug 2018
Posts: 2

Comparing results obtained by FE, RE, and OLS regression models when usind different time periods and dependent variables

29 Aug 2018, 12:03

Dear Stata Community,

Given I am fairly new to econometrics and the understandings behind this, I have a few questions regarding the use of the three regression models RE, FE, and OLS and the comparison of these in analyzing panel data, which I hope you can help me answer.

I am currently regressing data with the use of Stata and performing both the Hausman test and the LM or F-test in determining which model is the appropriate one to use.
My data set contains all listed companies in the Nordics during the time period 2005-2017.
The models I am regressing are models where I have investment ratio as the dependent variable and lagged cash flow as the independent variable. My model changes in regards to two things:

1) time period; i.e. I run the same regression for three different time periods (2005-2007; 2008-2009; 2010-2017)
2) After I have run the three regressions in bullet 1 and analyzed these, I then run the regressions for the three time periods, but by using two other dependent variables.

What I have currently done is run the two tests to determine what regression model to use for each of the three time periods.
For the models with different dependent variables for all the three periods I have done the exact same for each of the periods. These tests show different results in terms of what model is the correct one to use.

My question is:
1) Can I compare the results for the three time periods by using different models?
2) When I later on use the two other dependent variables in the three time periods, can I then also use different models across time periods and dependent variables and compare these?

As a way of showing you my codes, I have written them down below. The sections marked in red are the important ones (the dependent variable I change, different time periods used, and the tests of which model to use).

Thank you very much in advance!

Code:

* Load data
clear
insheet using "C:\Users\dkkrz\Desktop\stata3middel.csv", delimiter(";")

* Encode panel data
encode selskabsnavn, gen(selskab)
tsset selskab r
    
* Create investments divided by lagged total assets variable
gen totalassetskorrigeret_1 = l.totalassetskorrigeret
gen capex_ratio = (capitalexpendituresadditiontofix/totalassetskorrigeret_1)

* Ekskludér negative CFLOW
drop if cflow<0

* Create lagged cflow variable
gen cflow_1 = l.cflow

****************************************

* Random effects models all years last year
xtreg capex_ratio cflow_1, re
estimates store random_effects

* Fixed effects models all years last year
xtreg capex_ratio cflow_1, fe
estimates store fixed_effects

* Test random effects vs. fixed effects
hausman fixed_effects random_effects

* Regular OLS all years last year
regress capex_ratio cflow_1

* Test OLS vs. random effects (Breusch-Pagan)
estat hettest

* Use random effects with categories
xtreg capex_ratio cflow_1, re

*******************************************
* Create lagged cflow variable
gen netsalesorrevenues_1 = l.netsalesorrevenues
gen sales_growth = (netsalesorrevenues/netsalesorrevenues_1 - 1)*100
gen sales_growth_1 = l.sales_growth
gen sales_growth_sq = sales_growth^2

* Encode industry
encode industry10, gen(industries10)

* regress MTB
regress mtb sales_growth_1 sales_growth_sq i.industries10

* Get mtb_f and mtb_r
predict mtb_f
predict mtb_r, residuals

* Lag mtb_f and mtb_r
gen mtb_f_1 = l.mtb_f
gen mtb_r_1 = l.mtb_r

****************************************

* Subset for correct period
keep if hllcllhc == "LL-HC" | hllcllhc == "HL-LC"

gen period = ""
replace period = "pre_crisis" if r < 2008
replace period = "crisis" if r >= 2008 & r<=2009
replace period = "post_crisis" if r > 2009

keep if period == "post_crisis"


* Variable - model
encode hllcllhc, gen(hllcllhc_factor)
encode country, gen(countries)
encode period, gen(period_factor)
encode cap, gen(cap_factor)

****************************************

* Random effects models all years last year
xtreg capex_ratio cflow_1, re
estimates store random_effects

* Fixed effects models all years last year
xtreg capex_ratio cflow_1, fe
estimates store fixed_effects

* Test random effects vs. fixed effects
hausman fixed_effects random_effects

* Regular OLS all years last year
regress capex_ratio cflow_1

* Test OLS vs. random effects (Breusch-Pagan)
estat hettest

* Use random effects with categories
xtreg capex_ratio cflow_1, re

******************** VIGTIGSTE MODEL ********************

* Random effects models with categories
xtreg capex_ratio i.hllcllhc_factor##c.cflow_1 mtb_f_1 mtb_r_1, re
estimates store random_effects

* Fixed effects models with categories
xtreg capex_ratio i.hllcllhc_factor##c.cflow_1 mtb_f_1 mtb_r_1, fe
estimates store fixed_effects

* Test random effects vs. fixed effects
hausman fixed_effects random_effects

* Regular OLS with categories
regress capex_ratio i.hllcllhc_factor##c.cflow_1 mtb_f_1 mtb_r_1
margins i.hllcllhc_factor, dydx(c.cflow_1)
marginsplot

* Test OLS vs. random effects (Breusch-Pagan)
estat hettest

* Use random effects with categories
xtreg capex_ratio i.hllcllhc_factor##c.cflow_1 mtb_f_1 mtb_r_1, re

*************************************************************************************************************************************

Tags: None

Kara Rashid

Join Date: Aug 2018

Posts: 2
#2

03 Nov 2018, 04:46

Anyone?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17851
#3

03 Nov 2018, 05:29

Kara:
welcome to this forum.
Questions like the one you posted are often left unreplied, simply because nobody knows your data and the related data generating process.
That said, your strategy should consider:
1) a specification in the right-hand side of your regression equation that gives a fair and true view of the data generating process. Playing around with different predictors in search of the model with the highest number of statistically significant predictors is not the (methodological) way to go, could easily end up in an endless try and -re-try process, that can be destroyed by any average-skilled reviewer. The post-estimation analysis (that I do not find in your post) is relevant as well (endogeneity, non-linear specification and, last but not least, heteroskedasticity should be investigated and accounted for accordingly). By the way, -hettest- investigates the presence of heteroskedasticity; you should use -xttest0- to investigate random effects.
2) it's rare (but possible) that pooled OLS outperforms -xtreg- with panel data. When default standard errors are used, if the F-test appearing as a footnote of the -xtreg,fe- outcome table lacks statistical significance, you should switch to pooled OLS;
3) I fail to get why you select different time periods from your dataset and try different regression models;
As an aside, please note that posting what you got from Stata can increase your chance of getting helpful replies.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

Comparing results obtained by FE, RE, and OLS regression models when usind different time periods and dependent variables

Comment

Comment