Panel Data: FE Regression = Panel Quasi-RE Regression = RE with Controls

Marcus Fiedler

Join Date: Nov 2016

Posts: 7
#1

Panel Data: FE Regression = Panel Quasi-RE Regression = RE with Controls

02 Nov 2016, 09:32

Hello all,

my Name ist Marcus Fiedler. I am writing a dissertation (PhD) about the human resources director.
UP to now i am working with Stata 14. The structure will be set with "xtset ID Year".

The literature suggests that there are three different methods for getting "same" results:
Panel FE regression (FE procedure involves group-mean-centering all iV, but I have time invariant variables...) or
Panel Quasi-RE Regression or
RE with Controls (controls are mean variables from iv)

What I want to do:
In my data there are time-invariant variables which means they don´t get an coefficient with FE regression and omitted. That means, I have to do a Quasi-RE (a), including group-mean-centering variables, or RE with controls (b), iv and mean iv.

Example (simple):
(a) x= (y1-y1m)+(y2-y2m)...
(b) x = y1+y1m+y2+y2m...
Both methods produce the same results like FE. The first method is also called "hybrid".

What my problem is:
I will produce RE with controls. My problems is that I have missing data and it isn´t possible calculating and including MEANS iv before the regression is running.

Example:
t a b
2005 12 40
2006 17 30
2007 13 x
2008 x 20
Calculating before running: mean a = 14
Calculation before running: mean b = 30

BUT calculation means before running the RE regression is misleading because my calculation doesn´t involve missing data. Missing data would "erase" two lines and technical means (right or real means) would be:
a = 14,5 (2007 and 2008 deleted)
b = 35 (2007 and 2008 deleted)

Manual calucalting is very tricky and so my question:
Is it possible that I can save mean variables, produced by Stata (under investigation of missing), after running a regression?
That would I mean that I don´t have to calculate means before.

I hope you can understand my problem and let me know a practical solution

Have a nice day,
Marcus
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#2

02 Nov 2016, 09:48

Marcus:
welcome to the list.
First off, please devote some of your time in reading the FAQ and learn how to post more effectively (-search dataex- for posting examples/excerpts of your dataset is one wise step to take)..
That said:
-despite its limitations -hausman- test (as well as the literature in your research field) can point you out to -fe- or -re- specification;
- if you have missing data, Stata will apply listwise deletion to panel_ids with missing values in any variable. Hence, if you cannot retrieve the missing values submitting queries to investigators or the like, you may want to impute them (please, see -help mi-) or linearly interpolate them (please, see -help ipolate-).
It is also advisable to investigate whether missingness is informative or not before deciding how to fix this issue.

Kind regards,
Carlo
(Stata 19.0)
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2593
#3

02 Nov 2016, 11:25

There are different ways to accomplish your task, for example:

Code:

egen misobs = rowmiss(a b) by ID: egen a_mean = mean(a) if misobs == 0 by ID: egen b_mean = mean(b) if misobs == 0 xtreg y a b a_mean b_mean, re

Alternatively, you could run the regression without the mean variables first and then use the stored estimation sample to determine the relevant observations:

Code:

xtreg y a b, re by ID: egen a_mean = mean(a) if e(sample) by ID: egen b_mean = mean(b) if e(sample) xtreg y a b a_mean b_mean, re

With regard to time-invariant variables in panel data models, you can find various other discussions here on Statalist:
Fixed Effects and time-invariant variables

https://www.kripfganz.de/stata/
Comment
Marcus Fiedler

Join Date: Nov 2016

Posts: 7
#4

03 Nov 2016, 01:00

Hello Carlo,
hello Sebastian,

thank you for your comments. I think my description isn´t as good as enough.

Hausman-test is a nice tool but the problem is inherent which means houseman tends to re with huge samples (Stata also includes an alternative likelihood-ratio test for testing fe against re) - see Wooldrigde, Allison and others.
FE isn´t a practical solution in my way but the literature tells us FE is the best way for getting "right" structure without bias. So i have to use a Quasi-RE or RE with mean controls (=hybrid, CRE) which are calculating coefficients and errors like FE.
Imputation and Interpolation are just misleading because of the structure and bias my data.

I add an example for specifying my problem. In this example there are 48 missing values within x1 and 14 missing values within x2. Stata will be delete listwise:
48 missing values + 14 missing values = 62 missing values for every variable!

This is exactly my problem because BEFORE running I generate mean variables (group-mean-centering) for Quasi-RE/RE with controls under investigation of missing values. So I include x1 and mean x1 with 48 missing values and x2 and mean x2 with 14 missing values but listwise deletion moves to x1 and mean x1 with 62 missing values and x2 and mean x2 with 62 missing values. This is the reason why my Quasi-RE or RE with mean controls <> FE.
What I have to do is calculating all variables with every missing value from the other variables before running. This is very complicated.

And so my question, can I store mean variables from FE procedure to use this ones for my Quasi-RE or RE with controls?

Best regards,
Marcus

Attached Files

example.xlsx (75.8 KB, 1 view)
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#5

03 Nov 2016, 01:08

Marcus:
I would say you're making your life harder than necessary.
You have missing values and Stata behaves as expected.
You state that interpolation and mulptiple imputation induce biases in your regression (but having non-fixed informative missingness, if that were the case with your dataset, would bias your results as well).
The proposal might be to report both the regession with and without missing values and comment on the differences in your research report.
As an aside, please do not attach (by the way, please read the FAQ about attachments. Thanks) spreadsheet, as most of us do not download them, due to the risk of active malwares. Thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2593
#6

03 Nov 2016, 07:05

Originally posted by Marcus Fiedler View Post

And so my question, can I store mean variables from FE procedure to use this ones for my Quasi-RE or RE with controls?

I still believe that my earlier suggestion just does exactly what you want.

https://www.kripfganz.de/stata/
Comment
Marcus Fiedler

Join Date: Nov 2016

Posts: 7
#7

04 Nov 2016, 02:41

Hello Sebastian,

I still believe too
You are right, I tested it with my creating style, i. e. egen A= mean(A), by(ID_Unt), in different data sets - with and without missing values.
Stata always gives same results.

Let me add a further question: Like the literatur all coefficients are the same and the errors are different. Many researches mentioned that significance in RE with Controls = FE but within my data it isn´t. Do you have a explanation why not? Maybe procedure problem?

code: xtreg eAnmeldung_gesamt_ROT_lag eAnmeldung_2013 eAnmeldung_2013_mean UnternehmensAlter_CAP UnternehmensAlter_CAP_mean Ansprueche_Patent_CAP2013 Ansprueche_Patent_CAP2013_mean , re

eAnmeldung_gesam~g | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------------+----------------------------------------------------------------
eAnmeldung_2013 | .9604805 .0141822 67.72 0.000 .9326838 .9882771
eAnmeldung_2013_~n | -.1490229 .0159141 -9.36 0.000 -.180214 -.1178317
UnternehmensAlte~P | -3.190285 1.121075 -2.85 0.004 -5.38755 -.9930188
UnternehmensAlte~n | 3.149205 1.12189 2.81 0.005 .9503412 5.348069
Ansprueche_Pa~2013 | .2975801 1.537106 0.19 0.846 -2.715092 3.310253
Ansprueche_Paten~n | -.0756286 2.427935 -0.03 0.975 -4.834294 4.683037
_cons | 5.946127 5.482549 1.08 0.278 -4.799472 16.69172

code: xtreg eAnmeldung_gesamt_ROT_lag eAnmeldung_2013 UnternehmensAlter_CAP Ansprueche_Patent_CAP2013 , fe

eAnmeldung_gesam~g | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------------+----------------------------------------------------------------
zenteAnmeldung_2~3 | .9604805 .0180785 53.13 0.000 .9250472 .9959137
zentUnternehmens~P | -3.190283 1.429068 -2.23 0.026 -5.991204 -.389362
zentAnspruech~2013 | .2975803 1.959396 0.15 0.879 -3.542765 4.137925
_cons | 106.6609 16.9874 6.28 0.000 73.36621 139.9556

Like the literatur all coefficients are the same and the errors are different

Best regards,
Marcus
Comment
Marcus Fiedler

Join Date: Nov 2016

Posts: 7
#8

04 Nov 2016, 04:43

Hello agian,

can anybody delete my last post? There are mistakes included, sorry.
The right example: FE vs. QRE vs CRE

xtreg eAnmeldung_gesamt_ROT_lag eAnmeldung_2013 UnternehmensAlter_CAP Ansprueche_Patent_CAP2013 , i(ID_Unt) fe
eAnmeldung_gesamt_RO~g | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------------------+----------------------------------------------------------------
eAnmeldung_2013 | .9604805 .01486 64.64 0.000 .9312978 .9896632
UnternehmensAlter_CAP | -3.190284 1.17465 -2.72 0.007 -5.497112 -.883457
Ansprueche_Patent~2013 | .2975802 1.610564 0.18 0.853 -2.865312 3.460473
_cons | 291.9372 115.2268 2.53 0.012 65.64994 518.2244

mixed eAnmeldung_gesamt_ROT_lag eAnmeldung_2013_z UnternehmensAlter_CAP_z Ansprueche_Patent_CAP2013_z, || ID_Unt:,
eAnmeldung_gesamt_RO~g | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-----------------------+----------------------------------------------------------------
eAnmeldung_2013_z | .9604805 .0148204 64.81 0.000 .9314331 .9895279
UnternehmensAlter_CA~z | -3.190284 1.171519 -2.72 0.006 -5.486419 -.8941494
Ansprueche_Patent_CA~z | .2975802 1.60627 0.19 0.853 -2.850651 3.445812
_cons | 105.4913 29.95113 3.52 0.000 46.78819 164.1945
...

mixed eAnmeldung_gesamt_ROT_lag eAnmeldung_2013 eAnmeldung_2013_mean UnternehmensAlter_CAP UnternehmensAlter_CAP_mean Ansprueche_Patent_CAP2013 Ansprueche_Patent_CAP2013_mean, || ID_Unt:,
eAnmeldung_gesamt_RO~g | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-----------------------+----------------------------------------------------------------
eAnmeldung_2013 | .9604805 .014112 68.06 0.000 .9328214 .9881396
eAnmeldung_2013_mean | -.1490229 .0158354 -9.41 0.000 -.1800596 -.1179861
UnternehmensAlter_CAP | -3.190285 1.115527 -2.86 0.004 -5.376677 -1.003893
UnternehmensAlter_CA~n | 3.149205 1.116338 2.82 0.005 .9612229 5.337187
Ansprueche_Patent~2013 | .2975801 1.529499 0.19 0.846 -2.700183 3.295344
Ansprueche_Patent_CA~n | -.0756286 2.41592 -0.03 0.975 -4.810745 4.659488
_cons | 5.946127 5.455417 1.09 0.276 -4.746294 16.63855
...

Let me add a further question: Like the literatur all coefficients are the same and the errors are different. Many researches mentioned that significance don´t vary in RE with Controls and FE but within my data it isn´t. Do you have a explanation why not?

Best regards,
Marcus
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#9

04 Nov 2016, 05:24

Markus:
your message is diffcult to read.
Please post what you typed and what Stata gave you back via CODE delimiters (see FAQ on that topic). Thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment
Marcus Fiedler

Join Date: Nov 2016

Posts: 7
#10

07 Nov 2016, 03:35

Hello Carlo,

I checked my regressions and centered variables producing same results like variable+mean variable.
Using FE linear regression producing "same" results - deviations are very small.

FE
. xtreg eAnmeldung_gesamt_ROT_lag eAnmeldung_2013 UnternehmensAlter_CAP Ansprueche_Patent_CAP20
> 13 Selbst_Vorw2013 AlterVorw2013 Vorw_Patent2013 AlterRueck2013 Groesse_IMPADOC2013 FuE_Umsat
> z TsQ Risiko_NEU ROA Bilanzsumme_ROT , i(ID_Unt) fe vce(robust)

Fixed-effects (within) regression Number of obs = 709
Group variable: ID_Unt Number of groups = 93

R-sq: Obs per group:
within = 0.8914 min = 1
between = 0.9844 avg = 7.6
overall = 0.9521 max = 9

F(12,92) = .
corr(u_i, Xb) = -0.7010 Prob > F = .

(Std. Err. adjusted for 93 clusters in ID_Unt)
-------------------------------------------------------------------------------------------
| Robust
eAnmeldung_gesamt_ROT_lag | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------------------------+----------------------------------------------------------------
eAnmeldung_2013 | .9559886 .0445846 21.44 0.000 .8674397 1.044538
UnternehmensAlter_CAP | -.3083685 .5470274 -0.56 0.574 -1.394812 .7780751
Ansprueche_Patent_CAP2013 | -.1538362 .7700064 -0.20 0.842 -1.683136 1.375463
Selbst_Vorw2013 | -9.698364 15.88519 -0.61 0.543 -41.24771 21.85099
AlterVorw2013 | 4.412097 2.950669 1.50 0.138 -1.448187 10.27238
Vorw_Patent2013 | -1.300005 2.939579 -0.44 0.659 -7.138263 4.538253
AlterRueck2013 | -1.253429 .5782712 -2.17 0.033 -2.401926 -.1049329
Groesse_IMPADOC2013 | 20.41099 10.26867 1.99 0.050 .0165373 40.80545
FuE_Umsatz | -1286.19 888.7834 -1.45 0.151 -3051.39 479.011
TsQ | 4.041007 8.375762 0.48 0.631 -12.59398 20.67599
Risiko_NEU | -60.14005 55.47918 -1.08 0.281 -170.3265 50.0464
ROA | 113.5713 45.21094 2.51 0.014 23.77845 203.3641
Bilanzsumme_ROT | -3.01e-11 2.63e-11 -1.14 0.255 -8.24e-11 2.21e-11
_cons | 64.43248 82.11338 0.78 0.435 -98.65177 227.5167

Quasi-FE (same results with CRE):
. mixed eAnmeldung_gesamt_ROT_lag eAnmeldung_2013_z UnternehmensAlter_CAP_z Ansprueche_Patent_
> CAP2013_z Selbst_Vorw2013_z AlterVorw2013_z Vorw_Patent2013_z AlterRueck2013_z Groesse_IMPADO
> C2013_z FuE_Umsatz_z TsQ_z Risiko_NEU_z ROA_z Bilanzsumme_ROT_z , || ID_Unt:, vce(robust)

Performing EM optimization:

Performing gradient-based optimization:

Iteration 0: log pseudolikelihood = -4241.2987
Iteration 1: log pseudolikelihood = -4241.2987

Computing standard errors:

Mixed-effects regression Number of obs = 709
Group variable: ID_Unt Number of groups = 93

Obs per group:
min = 1
avg = 7.6
max = 9

Wald chi2(12) = .
Log pseudolikelihood = -4241.2987 Prob > chi2 = .

(Std. Err. adjusted for 93 clusters in ID_Unt)
---------------------------------------------------------------------------------------------
| Robust
eAnmeldung_gesamt_ROT_lag | Coef. Std. Err. z P>|z| [95% Conf. Interval]
----------------------------+----------------------------------------------------------------
eAnmeldung_2013_z | .9559886 .0441734 21.64 0.000 .8694103 1.042567
UnternehmensAlter_CAP_z | -.3083681 .541982 -0.57 0.569 -1.370633 .753897
Ansprueche_Patent_CAP2013_z | -.1538362 .7629044 -0.20 0.840 -1.649101 1.341429
Selbst_Vorw2013_z | -9.698364 15.73867 -0.62 0.538 -40.54559 21.14887
AlterVorw2013_z | 4.412097 2.923454 1.51 0.131 -1.317767 10.14196
Vorw_Patent2013_z | -1.300005 2.912466 -0.45 0.655 -7.008335 4.408324
AlterRueck2013_z | -1.253429 .5729376 -2.19 0.029 -2.376366 -.1304923
Groesse_IMPADOC2013_z | 20.41099 10.17395 2.01 0.045 .4704108 40.35158
FuE_Umsatz_z | -1286.19 880.5858 -1.46 0.144 -3012.106 439.7269
TsQ_z | 4.041007 8.29851 0.49 0.626 -12.22377 20.30579
Risiko_NEU_z | -60.14005 54.96748 -1.09 0.274 -167.8743 47.59423
ROA_z | 113.5713 44.79395 2.54 0.011 25.77675 201.3658
Bilanzsumme_ROT_z | -3.01e-11 2.61e-11 -1.16 0.248 -8.12e-11 2.10e-11
_cons | 105.4736 30.1092 3.50 0.000 46.46068 164.4866
---------------------------------------------------------------------------------------------

------------------------------------------------------------------------------
| Robust
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
ID_Unt: Identity |
var(_cons) | 82495.12 46210.28 27518.27 247306.4
-----------------------------+------------------------------------------------
var(Residual) | 4926.054 1837.542 2371.263 10233.37

MIXED is only a basic for myself. I am using MENBREG (random interceopt), because it isn´t a linear model, and the results has a "greater" deviation than with MIXED but I can look over
For dealing with heteroscedasticity I am using vce(robust) = vce(cluster ID_Unt).
For dealing with heterogenity I include a lagged dependent variable as a control.
My data has, from a logical view, an n-order autoregressive structure.

Does anybody know how I can deal with n-order autoregressive?
Do you a test for n-order autoregressive in multiple panels?

Durbin-Watson has different problems:
1. From STATA: "dwstat sample may not include multiple panels r(459);"
2. It isn´t prossible to include an lagged dependent variable as a control.
3. DWSTATA only can first order.

Best regards,
Marcus
Comment

Marcus Fiedler

Join Date: Nov 2016
Posts: 7

#11

09 Nov 2016, 03:29

Hello again,

I wanna give you an update.
Looking for FE = CRE = Hybrid I tested models (always same variables) with xtreg, poisson and nbreg and their effects are:

xtreg	poisson	nbreg
FE = centered RE	FE = centered RE (suggestion, because STATA only has conditioal FE)	FE = centered RE (suggestion, because STATA only has conditioal FE)
Hybrid = CRE	Hybrid = CRE	Hybrid = CRE
centered RE = centered RE with mean controls	centered RE = centered RE with mean controls	NO! Why?
BE = centered RE with mean controls	BE = centered RE with mean controls	NO! Why?
centered RE = Hybrid	centered RE = Hybrid	NO! Why?
FE = Hybrd = CRE	FE = Hybrd = CRE (suggestion, because STATA only has conditioal FE, centered RE = Hybrid >>> Hybrid = FE)	NO! Why?

I cannot add my results exactly, they need to much space.

Do you have an explanation? Why there are so many problems with nbreg?
Why does linear or poisson fit and nbreg not?

Thank yu for your help.

Best regards,
Marcus

Announcement

Panel Data: FE Regression = Panel Quasi-RE Regression = RE with Controls

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment