OLS Assumptions Panel Data with fixed and random effects.

Joelle van Esch

Join Date: Jun 2023

Posts: 7
#1

OLS Assumptions Panel Data with fixed and random effects.

08 Jul 2023, 02:35

Hello everyone,

I am quite new to Stata and Statalist.
I am doing my thesis and now I need to run my regression. However, it is an OLS regression so I need to perform several tests: Linearity, Normality, Homoskedasticity, independence, multicollineairty and no outliers.

I performed the Hausman Test to find whether I need to use Fixed effects or Random effects and now I have the following regressions:

xtreg cumlret0to10 cash incentive dual c.cash#i.dual c.incentive#i.dual gen educ age for ten size lev sales roa mtb, re
reghdfe cumlret0to60 cash incentive dual c.cash#i.dual c.incentive#i.dual gen educ age for ten size lev sales roa mtb, absorb(isin time)

my fixed effects will be company and time fixed effects.

However now my question is, how do I check for the OLS assumptions? In every example people are using regress instead of xtreg or regdfhe, is there a way to do this? or should I check for the OLS assumptions using regress without the Hausman Test and do the Hausman test only after I have checked the OLS assumptions?

I hop to hear from you and thank you in advance.

Kind regards, Joëlle
Tags: None
Rich Goldstein

Join Date: Mar 2014

Posts: 4466
#2

08 Jul 2023, 04:35

there are a lot of assumptions but many are not very important; you have to decide which are important given both your goals and you data; but, generally, see

Code:

help regress postestimation help regress postestimation plots
Comment
Joelle van Esch

Join Date: Jun 2023

Posts: 7
#3

08 Jul 2023, 04:50

Thank you Rich! I will indeed discuss which one are necessary. However the command you send is only for regress, now I found help xtreg postestimation but nothing for reghdfe
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17712

08 Jul 2023, 08:13

Joelle:
no need to -reghdfe- here, as you can safely switch to -xtreg,fe- and obtain the same results for the shared coefficients (-i.year- is not calculated in -reghdfe-, as it is already aborbed as a fixed effect), as you can see in the following toy-example:

Code:

. use "https://www.stata-press.com/data/r17/nlswork.dta"
(National Longitudinal Survey of Young Women, 14-24 years old in 1968)

. xtreg ln_wage c.age##c.age i.year, fe

Fixed-effects (within) regression               Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-squared:                                      Obs per group:
     Within  = 0.1162                                         min =          1
     Between = 0.1078                                         avg =        6.1
     Overall = 0.0932                                         max =         15

                                                F(16,23784)       =     195.45
corr(u_i, Xb) = 0.0613                          Prob > F          =     0.0000

------------------------------------------------------------------------------
     ln_wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         age |   .0728746   .0107894     6.75   0.000     .0517267    .0940224
             |
 c.age#c.age |  -.0010113    .000061   -16.57   0.000    -.0011309   -.0008917
             |
        year |
         69  |   .0647054   .0158222     4.09   0.000     .0336928     .095718
         70  |   .0284423   .0234621     1.21   0.225     -.017545    .0744295
         71  |   .0579959   .0326524     1.78   0.076    -.0060048    .1219967
         72  |   .0510671   .0422995     1.21   0.227    -.0318426    .1339769
         73  |   .0424104    .052118     0.81   0.416    -.0597442    .1445651
         75  |   .0151376   .0717194     0.21   0.833    -.1254371    .1557123
         77  |   .0340933   .0918106     0.37   0.710    -.1458613    .2140478
         78  |   .0537334   .1023339     0.53   0.600    -.1468475    .2543143
         80  |   .0369475   .1221806     0.30   0.762    -.2025343    .2764293
         82  |   .0391687   .1423573     0.28   0.783    -.2398606     .318198
         83  |    .058766   .1523743     0.39   0.700    -.2398974    .3574294
         85  |   .1042758   .1726431     0.60   0.546    -.2341157    .4426673
         87  |   .1242272   .1930108     0.64   0.520    -.2540863    .5025406
         88  |   .1904977   .2068016     0.92   0.357    -.2148466     .595842
             |
       _cons |   .3937532   .2001741     1.97   0.049     .0013992    .7861072
-------------+----------------------------------------------------------------
     sigma_u |  .40275174
     sigma_e |  .30127563
         rho |  .64120306   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(4709, 23784) = 8.75                 Prob > F = 0.0000

. reghdfe ln_wage c.age##c.age , abs(idcode year)
(dropped 551 singleton observations)
(MWFE estimator converged in 8 iterations)

HDFE Linear regression                            Number of obs   =     27,959
Absorbing 2 HDFE groups                           F(   2,  23784) =     138.12
                                                  Prob > F        =     0.0000
                                                  R-squared       =     0.6593
                                                  Adj R-squared   =     0.5995
                                                  Within R-sq.    =     0.0115
                                                  Root MSE        =     0.3013

------------------------------------------------------------------------------
     ln_wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         age |   .0728746   .0107894     6.75   0.000     .0517267    .0940224
             |
 c.age#c.age |  -.0010113    .000061   -16.57   0.000    -.0011309   -.0008917
             |
       _cons |   .4586164   .2997464     1.53   0.126    -.1289057    1.046138
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
      idcode |      4159           0        4159     |
        year |        15           1          14     |
-----------------------------------------------------+

In addition:
1) normality is a (weak) requirement for residuals distributions only. Skip it;
2) linearity: you probably mean investigating whether or not a given predictor shows a non-linear relationship with the dependent variable. Just plug in the right-hand side of your regression equation a lineae and a square term for that predictor via interaction, exploiting the -fvvarlist- notation (##) reported in the toy-example above;
3) independence: do you mean lack of endogeneity? This is difficult to test and the only way out is knowing very well the data generating process your're investigating;
4) multicollinearity is rarely an issue. By construction a linear and a square term for he same predictor shows a sky-rocketing VIF. Multicollinearity becomes annoying when it produces "weird" standard errors. In addition, you can fing ìd one of the most humorous and methodologically reasurring description of multicollinearity in A Course in Econometrics — Arthur S. Goldberger | Harvard University Press, Chapter 23;
5) you can test the heteroskedasticity in -xtreg,fe- via the community-contributed module -xttest2-; see the following link for the -re- specification:(11) Do we have a test for heteroskedasticity for random model in stata? | ResearchGate ;
6) you can test autocorrelation in both -fe- and -re- specifications via the community-contributed module -xtserial-;
7) in you detect heteroskedasticity and/or autocorrelation, you can invoke cluster-robust standard error via -robust- or -vce(cluster panelid)-, that do the very same job under -xtreg- before testing the two specifications;
8) as -hausman- does not support non-default standard errors, you should switch to the community-contributed module -xtoverid- (that, being a bit old, does not support -fvvarlist- notation; see -xi:- as the usual fix).

Kind regards,
Carlo
(Stata 19.0)

Announcement

OLS Assumptions Panel Data with fixed and random effects.

Comment

Comment

Comment