Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Robust Standard Error in Regression

    Code:
    xtreg ROE CR1 CR2 LR OR MR SIZE GDP, fe
    Code:
    Fixed-effects (within) regression               Number of obs      =       112
    Group variable: Bank                            Number of groups   =        16
    
    R-sq:  within  = 0.0822                         Obs per group: min =         7
           between = 0.0498                                        avg =       7.0
           overall = 0.0008                                        max =         7
    
                                                    F(7,89)            =      1.14
    corr(u_i, Xb)  = -0.8021                        Prob > F           =    0.3463
    
    ------------------------------------------------------------------------------
             ROE |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             CR1 |  -1.744441   .7981766    -2.19   0.031    -3.330401   -.1584811
             CR2 |   .9476746   .4802421     1.97   0.052    -.0065563    1.901905
              LR |  -.6933553   .5128132    -1.35   0.180    -1.712304    .3255936
              OR |  -.2038989   .4717287    -0.43   0.667    -1.141214     .733416
              MR |  -2.292524   1.434061    -1.60   0.113    -5.141973    .5569252
            SIZE |   8.715667   7.129916     1.22   0.225    -5.451324    22.88266
             GDP |    3.56462    3.16306     1.13   0.263    -2.720313    9.849554
           _cons |    .700703   1.386947     0.51   0.615    -2.055131    3.456537
    -------------+----------------------------------------------------------------
         sigma_u |  .36351747
         sigma_e |  .44556858
             rho |  .39962025   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    F test that all u_i=0:     F(15, 89) =     1.07              Prob > F = 0.3955


    Code:
    xtreg ROE CR1 CR2 LR OR MR SIZE GDP, fe
    Code:
    Fixed-effects (within) regression               Number of obs      =       112
    Group variable: Bank                            Number of groups   =        16
    
    R-sq:  within  = 0.0822                         Obs per group: min =         7
           between = 0.0498                                        avg =       7.0
           overall = 0.0008                                        max =         7
    
                                                    F(7,15)            =      6.39
    corr(u_i, Xb)  = -0.8021                        Prob > F           =    0.0013
    
                                      (Std. Err. adjusted for 16 clusters in Bank)
    ------------------------------------------------------------------------------
                 |               Robust
             ROE |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             CR1 |  -1.744441    .905241    -1.93   0.073    -3.673917    .1850346
             CR2 |   .9476746   .6595358     1.44   0.171    -.4580927    2.353442
              LR |  -.6933553   .2649604    -2.62   0.019    -1.258105   -.1286056
              OR |  -.2038989   .4383102    -0.47   0.648    -1.138135    .7303372
              MR |  -2.292524   1.098562    -2.09   0.054    -4.634054    .0490057
            SIZE |   8.715667   7.835708     1.11   0.284    -7.985749    25.41708
             GDP |    3.56462   2.680092     1.33   0.203    -2.147861    9.277102
           _cons |    .700703   1.043965     0.67   0.512    -1.524455    2.925861
    -------------+----------------------------------------------------------------
         sigma_u |  .36351747
         sigma_e |  .44556858
             rho |  .39962025   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    
    .
    The F-Statistic of the first regression wasn't significant as shown above, but when i did include the 'Robust option', the F-statistic of the second regression became highly significant. I wanted to know if there's any implication for using robust. Because, I wouldn't want to exclude the model from the results because of Insignificant F-Statistics.

    However, its seems the robust model show a better results.

    Please, kindly advice accordingly. Thanks

  • #2
    Olalere:
    the reason supporting -robust()- is that the resulting SEs are interestingly different (the first two, at least) from the default ones.
    If you fear that the residual distribution suffers from heteroskedasticity -robust()- is the way to go.
    However, the main issue there rests on the fact that no coefficient in your model seems to be different from 0: it may well be that your sample is too limited for supporting any panel data regression model.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      It is also worth mentioning that -robust- in -xtreg, fe- is taken to mean cluster robust (panelvar). Cluster robust errors do not work well when the number of clusters is small. You have only 16. While different experts might disagree about just where to draw the line, 16 is, at best, borderline sufficient.

      Comment


      • #4
        Originally posted by Carlo Lazzaro View Post
        Olalere:
        the reason supporting -robust()- is that the resulting SEs are interestingly different (the first two, at least) from the default ones.
        If you fear that the residual distribution suffers from heteroskedasticity -robust()- is the way to go.
        However, the main issue there rests on the fact that no coefficient in your model seems to be different from 0: it may well be that your sample is too limited for supporting any panel data regression model.
        Thanks for your response Carlo. However, that's the only model with no overall fit (F-stat). So do you mean using "Robust Option" as regards the sample been too limited for supporting panel data model? Because, with my knowledge at least an overall sample of 100 will be okay to run a panel data regression model or what do you think?

        Comment


        • #5
          Originally posted by Clyde Schechter View Post
          It is also worth mentioning that -robust- in -xtreg, fe- is taken to mean cluster robust (panelvar). Cluster robust errors do not work well when the number of clusters is small. You have only 16. While different experts might disagree about just where to draw the line, 16 is, at best, borderline sufficient.
          Thanks for your reply Clyde. Do you mean 16 cluster sample is at least sufficient? By the way, aside the "Robust Option", is there any way i can make the F-stat of the model significant? Thanks.

          Comment


          • #6
            Do you mean 16 cluster sample is at least sufficient?
            No, I mean it is in a grey area. Some people would say 16 is enough, others would say it is not.

            By the way, aside the "Robust Option", is there any way i can make the F-stat of the model significant?
            Don't do this! Shopping around for a model that makes your result "statistically significant" is not science. It is a sure-fire way to generate false results that will not hold up under replication and do not reflect reality. If you try multiple analyses until you get the p-value you like, the p-value has no meaning. The p-value only has meaning (if it ever does!) if the analysis is selected in advance and the results of that analysis are accepted, no matter how they turn out. The choice of analytic technique needs to be based on the known statistical properties of the techniques and their applicability to your study design and data.

            Moreover, give some serious thought as to whether the p-value is even of any relevance to your research goals. It is often the case that the "hypothesis testing" framework is simply a Procrustean bed that mutilates the research. It is often far more useful to get quantitative point and interval estimates of a particular effect, or set of effects, than to test some artificial and inherently implausible "null hypothesis." I have no idea if that is the case in your situation, but it is quite commonly so.

            Do read Wasserstein RL, Lazar NA. The ASA's statement on p-values: context, process, and purpose. The American Statistician (2016). You can link to it at http://dx.doi.org/10.1080/00031305.2016.1154108.

            Comment


            • #7
              Olalere:
              the F-test you (seemingly) refer to is too difficult to calculate under -robust()-; hece Stata omits it at the foot of the -xtreg- outcome table.
              As far as your last question is concerned, in my opinion your sample is too limited and the individual effects probably not that informative (set aside the heteroskedasticity issue for a while, the result of the F-test at the foot of the first regression you ran casted some doubts about preferring -xtreg- vs -regression- with standard errors clustered on panelid).
              That said, I do share Clyde's advice about the oversold story of "p-value less than..." (a cutionary tale about this topic would be probably welcomed).
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                Originally posted by Carlo Lazzaro View Post
                Olalere:
                the F-test you (seemingly) refer to is too difficult to calculate under -robust()-; hece Stata omits it at the foot of the -xtreg- outcome table.
                As far as your last question is concerned, in my opinion your sample is too limited and the individual effects probably not that informative (set aside the heteroskedasticity issue for a while, the result of the F-test at the foot of the first regression you ran casted some doubts about preferring -xtreg- vs -regression- with standard errors clustered on panelid).
                That said, I do share Clyde's advice about the oversold story of "p-value less than..." (a cutionary tale about this topic would be probably welcomed).
                Thanks for your enlightenment Carlo.

                Comment


                • #9
                  Originally posted by Clyde Schechter View Post
                  No, I mean it is in a grey area. Some people would say 16 is enough, others would say it is not.


                  Don't do this! Shopping around for a model that makes your result "statistically significant" is not science. It is a sure-fire way to generate false results that will not hold up under replication and do not reflect reality. If you try multiple analyses until you get the p-value you like, the p-value has no meaning. The p-value only has meaning (if it ever does!) if the analysis is selected in advance and the results of that analysis are accepted, no matter how they turn out. The choice of analytic technique needs to be based on the known statistical properties of the techniques and their applicability to your study design and data.

                  Moreover, give some serious thought as to whether the p-value is even of any relevance to your research goals. It is often the case that the "hypothesis testing" framework is simply a Procrustean bed that mutilates the research. It is often far more useful to get quantitative point and interval estimates of a particular effect, or set of effects, than to test some artificial and inherently implausible "null hypothesis." I have no idea if that is the case in your situation, but it is quite commonly so.

                  Do read Wasserstein RL, Lazar NA. The ASA's statement on p-values: context, process, and purpose. The American Statistician (2016). You can link to it at http://dx.doi.org/10.1080/00031305.2016.1154108.
                  Thanks Clyde, I really learn from this.

                  Comment

                  Working...
                  X