Robust Standard Error in Regression

Olalere Oluwaseyi

Join Date: Jul 2016
Posts: 30

Robust Standard Error in Regression

18 Feb 2017, 06:56

Code:

xtreg ROE CR1 CR2 LR OR MR SIZE GDP, fe

Code:

Fixed-effects (within) regression               Number of obs      =       112
Group variable: Bank                            Number of groups   =        16

R-sq:  within  = 0.0822                         Obs per group: min =         7
       between = 0.0498                                        avg =       7.0
       overall = 0.0008                                        max =         7

                                                F(7,89)            =      1.14
corr(u_i, Xb)  = -0.8021                        Prob > F           =    0.3463

------------------------------------------------------------------------------
         ROE |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         CR1 |  -1.744441   .7981766    -2.19   0.031    -3.330401   -.1584811
         CR2 |   .9476746   .4802421     1.97   0.052    -.0065563    1.901905
          LR |  -.6933553   .5128132    -1.35   0.180    -1.712304    .3255936
          OR |  -.2038989   .4717287    -0.43   0.667    -1.141214     .733416
          MR |  -2.292524   1.434061    -1.60   0.113    -5.141973    .5569252
        SIZE |   8.715667   7.129916     1.22   0.225    -5.451324    22.88266
         GDP |    3.56462    3.16306     1.13   0.263    -2.720313    9.849554
       _cons |    .700703   1.386947     0.51   0.615    -2.055131    3.456537
-------------+----------------------------------------------------------------
     sigma_u |  .36351747
     sigma_e |  .44556858
         rho |  .39962025   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0:     F(15, 89) =     1.07              Prob > F = 0.3955

Code:

xtreg ROE CR1 CR2 LR OR MR SIZE GDP, fe

Code:

Fixed-effects (within) regression               Number of obs      =       112
Group variable: Bank                            Number of groups   =        16

R-sq:  within  = 0.0822                         Obs per group: min =         7
       between = 0.0498                                        avg =       7.0
       overall = 0.0008                                        max =         7

                                                F(7,15)            =      6.39
corr(u_i, Xb)  = -0.8021                        Prob > F           =    0.0013

                                  (Std. Err. adjusted for 16 clusters in Bank)
------------------------------------------------------------------------------
             |               Robust
         ROE |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         CR1 |  -1.744441    .905241    -1.93   0.073    -3.673917    .1850346
         CR2 |   .9476746   .6595358     1.44   0.171    -.4580927    2.353442
          LR |  -.6933553   .2649604    -2.62   0.019    -1.258105   -.1286056
          OR |  -.2038989   .4383102    -0.47   0.648    -1.138135    .7303372
          MR |  -2.292524   1.098562    -2.09   0.054    -4.634054    .0490057
        SIZE |   8.715667   7.835708     1.11   0.284    -7.985749    25.41708
         GDP |    3.56462   2.680092     1.33   0.203    -2.147861    9.277102
       _cons |    .700703   1.043965     0.67   0.512    -1.524455    2.925861
-------------+----------------------------------------------------------------
     sigma_u |  .36351747
     sigma_e |  .44556858
         rho |  .39962025   (fraction of variance due to u_i)
------------------------------------------------------------------------------

.

The F-Statistic of the first regression wasn't significant as shown above, but when i did include the 'Robust option', the F-statistic of the second regression became highly significant. I wanted to know if there's any implication for using robust. Because, I wouldn't want to exclude the model from the results because of Insignificant F-Statistics.

However, its seems the robust model show a better results.

Please, kindly advice accordingly. Thanks

Tags: None

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#2

18 Feb 2017, 09:11

Olalere:
the reason supporting -robust()- is that the resulting SEs are interestingly different (the first two, at least) from the default ones.
If you fear that the residual distribution suffers from heteroskedasticity -robust()- is the way to go.
However, the main issue there rests on the fact that no coefficient in your model seems to be different from 0: it may well be that your sample is too limited for supporting any panel data regression model.

Kind regards,
Carlo
(Stata 19.0)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#3

18 Feb 2017, 09:53

It is also worth mentioning that -robust- in -xtreg, fe- is taken to mean cluster robust (panelvar). Cluster robust errors do not work well when the number of clusters is small. You have only 16. While different experts might disagree about just where to draw the line, 16 is, at best, borderline sufficient.
Comment
Olalere Oluwaseyi

Join Date: Jul 2016

Posts: 30
#4

18 Feb 2017, 10:34

Originally posted by Carlo Lazzaro View Post

Olalere:
the reason supporting -robust()- is that the resulting SEs are interestingly different (the first two, at least) from the default ones.
If you fear that the residual distribution suffers from heteroskedasticity -robust()- is the way to go.
However, the main issue there rests on the fact that no coefficient in your model seems to be different from 0: it may well be that your sample is too limited for supporting any panel data regression model.

Thanks for your response Carlo. However, that's the only model with no overall fit (F-stat). So do you mean using "Robust Option" as regards the sample been too limited for supporting panel data model? Because, with my knowledge at least an overall sample of 100 will be okay to run a panel data regression model or what do you think?
Comment
Olalere Oluwaseyi

Join Date: Jul 2016

Posts: 30
#5

18 Feb 2017, 10:43

Originally posted by Clyde Schechter View Post

It is also worth mentioning that -robust- in -xtreg, fe- is taken to mean cluster robust (panelvar). Cluster robust errors do not work well when the number of clusters is small. You have only 16. While different experts might disagree about just where to draw the line, 16 is, at best, borderline sufficient.

Thanks for your reply Clyde. Do you mean 16 cluster sample is at least sufficient? By the way, aside the "Robust Option", is there any way i can make the F-stat of the model significant? Thanks.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#6

18 Feb 2017, 10:58

Do you mean 16 cluster sample is at least sufficient?

No, I mean it is in a grey area. Some people would say 16 is enough, others would say it is not.

By the way, aside the "Robust Option", is there any way i can make the F-stat of the model significant?

Don't do this! Shopping around for a model that makes your result "statistically significant" is not science. It is a sure-fire way to generate false results that will not hold up under replication and do not reflect reality. If you try multiple analyses until you get the p-value you like, the p-value has no meaning. The p-value only has meaning (if it ever does!) if the analysis is selected in advance and the results of that analysis are accepted, no matter how they turn out. The choice of analytic technique needs to be based on the known statistical properties of the techniques and their applicability to your study design and data.

Moreover, give some serious thought as to whether the p-value is even of any relevance to your research goals. It is often the case that the "hypothesis testing" framework is simply a Procrustean bed that mutilates the research. It is often far more useful to get quantitative point and interval estimates of a particular effect, or set of effects, than to test some artificial and inherently implausible "null hypothesis." I have no idea if that is the case in your situation, but it is quite commonly so.

Do read Wasserstein RL, Lazar NA. The ASA's statement on p-values: context, process, and purpose. The American Statistician (2016). You can link to it at http://dx.doi.org/10.1080/00031305.2016.1154108.
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#7

18 Feb 2017, 11:53

Olalere:
the F-test you (seemingly) refer to is too difficult to calculate under -robust()-; hece Stata omits it at the foot of the -xtreg- outcome table.
As far as your last question is concerned, in my opinion your sample is too limited and the individual effects probably not that informative (set aside the heteroskedasticity issue for a while, the result of the F-test at the foot of the first regression you ran casted some doubts about preferring -xtreg- vs -regression- with standard errors clustered on panelid).
That said, I do share Clyde's advice about the oversold story of "p-value less than..." (a cutionary tale about this topic would be probably welcomed).

Kind regards,
Carlo
(Stata 19.0)
Comment
Olalere Oluwaseyi

Join Date: Jul 2016

Posts: 30
#8

19 Feb 2017, 02:25

Originally posted by Carlo Lazzaro View Post

Olalere:
the F-test you (seemingly) refer to is too difficult to calculate under -robust()-; hece Stata omits it at the foot of the -xtreg- outcome table.
As far as your last question is concerned, in my opinion your sample is too limited and the individual effects probably not that informative (set aside the heteroskedasticity issue for a while, the result of the F-test at the foot of the first regression you ran casted some doubts about preferring -xtreg- vs -regression- with standard errors clustered on panelid).
That said, I do share Clyde's advice about the oversold story of "p-value less than..." (a cutionary tale about this topic would be probably welcomed).

Thanks for your enlightenment Carlo.
Comment
Olalere Oluwaseyi

Join Date: Jul 2016

Posts: 30
#9

19 Feb 2017, 02:28

Originally posted by Clyde Schechter View Post

No, I mean it is in a grey area. Some people would say 16 is enough, others would say it is not.

Don't do this! Shopping around for a model that makes your result "statistically significant" is not science. It is a sure-fire way to generate false results that will not hold up under replication and do not reflect reality. If you try multiple analyses until you get the p-value you like, the p-value has no meaning. The p-value only has meaning (if it ever does!) if the analysis is selected in advance and the results of that analysis are accepted, no matter how they turn out. The choice of analytic technique needs to be based on the known statistical properties of the techniques and their applicability to your study design and data.

Moreover, give some serious thought as to whether the p-value is even of any relevance to your research goals. It is often the case that the "hypothesis testing" framework is simply a Procrustean bed that mutilates the research. It is often far more useful to get quantitative point and interval estimates of a particular effect, or set of effects, than to test some artificial and inherently implausible "null hypothesis." I have no idea if that is the case in your situation, but it is quite commonly so.

Do read Wasserstein RL, Lazar NA. The ASA's statement on p-values: context, process, and purpose. The American Statistician (2016). You can link to it at http://dx.doi.org/10.1080/00031305.2016.1154108.

Thanks Clyde, I really learn from this.
Comment

Announcement

Robust Standard Error in Regression

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment