Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • OLS Assumptions Panel Data with fixed and random effects.

    Hello everyone,

    I am quite new to Stata and Statalist.
    I am doing my thesis and now I need to run my regression. However, it is an OLS regression so I need to perform several tests: Linearity, Normality, Homoskedasticity, independence, multicollineairty and no outliers.

    I performed the Hausman Test to find whether I need to use Fixed effects or Random effects and now I have the following regressions:

    xtreg cumlret0to10 cash incentive dual c.cash#i.dual c.incentive#i.dual gen educ age for ten size lev sales roa mtb, re
    reghdfe cumlret0to60 cash incentive dual c.cash#i.dual c.incentive#i.dual gen educ age for ten size lev sales roa mtb, absorb(isin time)

    my fixed effects will be company and time fixed effects.

    However now my question is, how do I check for the OLS assumptions? In every example people are using regress instead of xtreg or regdfhe, is there a way to do this? or should I check for the OLS assumptions using regress without the Hausman Test and do the Hausman test only after I have checked the OLS assumptions?


    I hop to hear from you and thank you in advance.

    Kind regards, Joëlle

  • #2
    there are a lot of assumptions but many are not very important; you have to decide which are important given both your goals and you data; but, generally, see
    Code:
    help regress postestimation
    help regress postestimation plots

    Comment


    • #3
      Thank you Rich! I will indeed discuss which one are necessary. However the command you send is only for regress, now I found help xtreg postestimation but nothing for reghdfe

      Comment


      • #4
        Joelle:
        no need to -reghdfe- here, as you can safely switch to -xtreg,fe- and obtain the same results for the shared coefficients (-i.year- is not calculated in -reghdfe-, as it is already aborbed as a fixed effect), as you can see in the following toy-example:
        Code:
        . use "https://www.stata-press.com/data/r17/nlswork.dta"
        (National Longitudinal Survey of Young Women, 14-24 years old in 1968)
        
        . xtreg ln_wage c.age##c.age i.year, fe
        
        Fixed-effects (within) regression               Number of obs     =     28,510
        Group variable: idcode                          Number of groups  =      4,710
        
        R-squared:                                      Obs per group:
             Within  = 0.1162                                         min =          1
             Between = 0.1078                                         avg =        6.1
             Overall = 0.0932                                         max =         15
        
                                                        F(16,23784)       =     195.45
        corr(u_i, Xb) = 0.0613                          Prob > F          =     0.0000
        
        ------------------------------------------------------------------------------
             ln_wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
        -------------+----------------------------------------------------------------
                 age |   .0728746   .0107894     6.75   0.000     .0517267    .0940224
                     |
         c.age#c.age |  -.0010113    .000061   -16.57   0.000    -.0011309   -.0008917
                     |
                year |
                 69  |   .0647054   .0158222     4.09   0.000     .0336928     .095718
                 70  |   .0284423   .0234621     1.21   0.225     -.017545    .0744295
                 71  |   .0579959   .0326524     1.78   0.076    -.0060048    .1219967
                 72  |   .0510671   .0422995     1.21   0.227    -.0318426    .1339769
                 73  |   .0424104    .052118     0.81   0.416    -.0597442    .1445651
                 75  |   .0151376   .0717194     0.21   0.833    -.1254371    .1557123
                 77  |   .0340933   .0918106     0.37   0.710    -.1458613    .2140478
                 78  |   .0537334   .1023339     0.53   0.600    -.1468475    .2543143
                 80  |   .0369475   .1221806     0.30   0.762    -.2025343    .2764293
                 82  |   .0391687   .1423573     0.28   0.783    -.2398606     .318198
                 83  |    .058766   .1523743     0.39   0.700    -.2398974    .3574294
                 85  |   .1042758   .1726431     0.60   0.546    -.2341157    .4426673
                 87  |   .1242272   .1930108     0.64   0.520    -.2540863    .5025406
                 88  |   .1904977   .2068016     0.92   0.357    -.2148466     .595842
                     |
               _cons |   .3937532   .2001741     1.97   0.049     .0013992    .7861072
        -------------+----------------------------------------------------------------
             sigma_u |  .40275174
             sigma_e |  .30127563
                 rho |  .64120306   (fraction of variance due to u_i)
        ------------------------------------------------------------------------------
        F test that all u_i=0: F(4709, 23784) = 8.75                 Prob > F = 0.0000
        
        . reghdfe ln_wage c.age##c.age , abs(idcode year)
        (dropped 551 singleton observations)
        (MWFE estimator converged in 8 iterations)
        
        HDFE Linear regression                            Number of obs   =     27,959
        Absorbing 2 HDFE groups                           F(   2,  23784) =     138.12
                                                          Prob > F        =     0.0000
                                                          R-squared       =     0.6593
                                                          Adj R-squared   =     0.5995
                                                          Within R-sq.    =     0.0115
                                                          Root MSE        =     0.3013
        
        ------------------------------------------------------------------------------
             ln_wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
        -------------+----------------------------------------------------------------
                 age |   .0728746   .0107894     6.75   0.000     .0517267    .0940224
                     |
         c.age#c.age |  -.0010113    .000061   -16.57   0.000    -.0011309   -.0008917
                     |
               _cons |   .4586164   .2997464     1.53   0.126    -.1289057    1.046138
        ------------------------------------------------------------------------------
        
        Absorbed degrees of freedom:
        -----------------------------------------------------+
         Absorbed FE | Categories  - Redundant  = Num. Coefs |
        -------------+---------------------------------------|
              idcode |      4159           0        4159     |
                year |        15           1          14     |
        -----------------------------------------------------+
        In addition:
        1) normality is a (weak) requirement for residuals distributions only. Skip it;
        2) linearity: you probably mean investigating whether or not a given predictor shows a non-linear relationship with the dependent variable. Just plug in the right-hand side of your regression equation a lineae and a square term for that predictor via interaction, exploiting the -fvvarlist- notation (##) reported in the toy-example above;
        3) independence: do you mean lack of endogeneity? This is difficult to test and the only way out is knowing very well the data generating process your're investigating;
        4) multicollinearity is rarely an issue. By construction a linear and a square term for he same predictor shows a sky-rocketing VIF. Multicollinearity becomes annoying when it produces "weird" standard errors. In addition, you can fing ìd one of the most humorous and methodologically reasurring description of multicollinearity in A Course in Econometrics — Arthur S. Goldberger | Harvard University Press, Chapter 23;
        5) you can test the heteroskedasticity in -xtreg,fe- via the community-contributed module -xttest2-; see the following link for the -re- specification:(11) Do we have a test for heteroskedasticity for random model in stata? | ResearchGate ;
        6) you can test autocorrelation in both -fe- and -re- specifications via the community-contributed module -xtserial-;
        7) in you detect heteroskedasticity and/or autocorrelation, you can invoke cluster-robust standard error via -robust- or -vce(cluster panelid)-, that do the very same job under -xtreg- before testing the two specifications;
        8) as -hausman- does not support non-default standard errors, you should switch to the community-contributed module -xtoverid- (that, being a bit old, does not support -fvvarlist- notation; see -xi:- as the usual fix).
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment

        Working...
        X