Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Checking for heteroskedasticity

    Hi guys,
    I am running a regression with country fixed effects using the command " xtreg Y X, fe". The results were fine because all the p-values were < 0.05. I ran the hausman test and the result was that the model was well explained with fixed effects. Also no multicollinearity problem. Then I realized I had to control for heteroskedasticity. In order to do so, I ran a regression using the following command; "xtreg Y X, fe robust" and the p-values are significantly higher and above 0.05. Does this mean that my model is not correct if I include standard robust errors? Can I control this heteroskedasticity problem? ( Please take under consideration that I am very new to stata and econometrics)
    Thanks in advance for the help
    Last edited by Alejandro Cuadros; 13 Jan 2021, 12:13.

  • #2
    Alejandro:
    the -robust- option in -xtreg- does the very same job the -vce(cluster clusterid)- actually does, that is taking both heteroskedasticity and/or autocorrelation into account.
    You do not report (as per FAQ) what you typed (exactly) and what Stata gave you back: hence, it is difficul to avoid a bit of guess-working in replying. The clustered-robust standard errors can be correct (and their default counterpart misleading) or the other way round: if your panels are enough (say, around 50), the clustered-robust standard errors are the way to go, otherwise they can be even more misleading than the default ones.
    It may also be that you have serial correlation issues that are captuted by the -robust- option.
    As an aside, please note that it is not correct to test for possible heteroskedastcity (and/or autocorrelation) after -hausman- (that does not support non-default standard errors). You should check it before comparing -fe- vs -re- specification and, if clustered-robust standard errors are necessary, you should forget -hausman- and consider the community-contributed programme -xtoverid- (that, in turns, does not support the -fvvarlist- notation; see -xi:- help file then if you have categorical -i.predictors- in the right-hand side of your regression equation).
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Thanks Carlo,

      I must reveal that I am a little bit confused with your answer. I will provide more detail so you can actually see what comes out using first only " xtreg y x, fe" then using the command "xtreg y x, fe robust" and then using the command " xtreg y x, fe vce(robust)".

      With my model I'm trying to explain the influence of Economic Freedom on the Female Labor Force Participation Rate. For this I am using data from 100 countries over 15 years. The idea is to demonstrate this relationship using a fixed effects model, in order to analyze the effect within countries. I added 3 independent variables besides Economic Freedom and there is no problem of multicollinearity. The result we got is that there was a significant influence (p value<0.05) of all independent variables on Female Labor Force Participation. After running the hausman test the result was that this model was well explained with Fixed Effects.

      The commands that I used are the following:

      egen Country1=group(Country)
      xtset Country1 Year, yearly
      xtsum Dependent Ind1 Ind2 Ind3 Ind4
      xtreg Dependent Ind1 Ind2 Ind3 Ind4, fe
      xtreg Dependent Ind1 Ind2 Ind3 Ind4, re
      xtreg Dependent Ind1 Ind2 Ind3 Ind4, fe robust

      Everything was perfect (significant) till the last step. Here are the pictures.

      Thanks again..
      Attached Files
      Last edited by Alejandro Cuadros; 13 Jan 2021, 13:42.

      Comment


      • #4
        .

        Comment


        • #5
          Alejandro:
          just to wrap up:
          1) -robust-=-vce(robust)-=-vce(cluster clusterid)-: so, under -xtreg-, you will get exactly the same results;
          2) as per the number of your panles (94) you should invoke non-default standard errors (that is, one of the option reported above), no matter whether the p-values reach statistical signicance or not (this is a minor issue, regardless what we're usually taught at the univeristy);
          3) -hausman- does not support non-default standard errors: hence, you should rely on the community-contributed command -xtoverid- (that you can easily download from SSC; see -help SSC-).
          The following toy-example can hopefully be helpful (please note that -xtoverid- does not support -fvvarlist- notation: hence you should prefix your code with -xi:- if you have categorical variables and/or create interactions by hand):
          Code:
          . use "https://www.stata-press.com/data/r16/nlswork.dta"
          (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
          
          . xtreg ln_wage c.age##c.age i.nev_mar, fe robust
          
          Fixed-effects (within) regression               Number of obs     =     28,494
          Group variable: idcode                          Number of groups  =      4,710
          
          R-sq:                                           Obs per group:
               within  = 0.1091                                         min =          1
               between = 0.0969                                         avg =        6.0
               overall = 0.0846                                         max =         15
          
                                                          F(3,4709)         =     341.15
          corr(u_i, Xb)  = 0.0391                         Prob > F          =     0.0000
          
                                       (Std. Err. adjusted for 4,710 clusters in idcode)
          ------------------------------------------------------------------------------
                       |               Robust
               ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                   age |   .0515872   .0046007    11.21   0.000     .0425677    .0606067
                       |
           c.age#c.age |  -.0005643   .0000757    -7.45   0.000    -.0007127   -.0004159
                       |
             1.nev_mar |  -.0182635    .010661    -1.71   0.087     -.039164     .002637
                 _cons |    .682259   .0686616     9.94   0.000     .5476501    .8168679
          -------------+----------------------------------------------------------------
               sigma_u |  .40461866
               sigma_e |  .30234418
                   rho |  .64170177   (fraction of variance due to u_i)
          ------------------------------------------------------------------------------
          
          . estimates store fe
          
          . xtreg ln_wage c.age##c.age i.nev_mar, re robust
          
          Random-effects GLS regression                   Number of obs     =     28,494
          Group variable: idcode                          Number of groups  =      4,710
          
          R-sq:                                           Obs per group:
               within  = 0.1087                                         min =          1
               between = 0.1024                                         avg =        6.0
               overall = 0.0876                                         max =         15
          
                                                          Wald chi2(3)      =    1260.91
          corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
          
                                       (Std. Err. adjusted for 4,710 clusters in idcode)
          ------------------------------------------------------------------------------
                       |               Robust
               ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                   age |   .0599506    .004303    13.93   0.000      .051517    .0683843
                       |
           c.age#c.age |  -.0006892   .0000712    -9.68   0.000    -.0008288   -.0005496
                       |
             1.nev_mar |   .0060386   .0090804     0.67   0.506    -.0117588    .0238359
                 _cons |   .5319739   .0631005     8.43   0.000     .4082991    .6556486
          -------------+----------------------------------------------------------------
               sigma_u |   .3642833
               sigma_e |  .30234418
                   rho |  .59211886   (fraction of variance due to u_i)
          ------------------------------------------------------------------------------
          
          . estimates store re
          
          . hausman fe re
          hausman cannot be used with vce(robust), vce(cluster cvar), or p-weighted data
          r(198);
          
          . g sq_age=age^2
          (24 missing values generated)
          
          . xi: xtreg ln_wage age sq_age i.nev_mar, re robust
          i.nev_mar         _Inev_mar_0-1       (naturally coded; _Inev_mar_0 omitted)
          
          Random-effects GLS regression                   Number of obs     =     28,494
          Group variable: idcode                          Number of groups  =      4,710
          
          R-sq:                                           Obs per group:
               within  = 0.1087                                         min =          1
               between = 0.1024                                         avg =        6.0
               overall = 0.0876                                         max =         15
          
                                                          Wald chi2(3)      =    1260.91
          corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
          
                                       (Std. Err. adjusted for 4,710 clusters in idcode)
          ------------------------------------------------------------------------------
                       |               Robust
               ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                   age |   .0599506    .004303    13.93   0.000      .051517    .0683843
                sq_age |  -.0006892   .0000712    -9.68   0.000    -.0008288   -.0005496
           _Inev_mar_1 |   .0060386   .0090804     0.67   0.506    -.0117588    .0238359
                 _cons |   .5319739   .0631005     8.43   0.000     .4082991    .6556486
          -------------+----------------------------------------------------------------
               sigma_u |   .3642833
               sigma_e |  .30234418
                   rho |  .59211886   (fraction of variance due to u_i)
          ------------------------------------------------------------------------------
          
          . xtoverid
          
          Test of overidentifying restrictions: fixed vs random effects
          Cross-section time-series model: xtreg re  robust cluster(idcode)
          Sargan-Hansen statistic  86.932  Chi-sq(3)    P-value = 0.0000
          
          .
          The null of -xtoverid- is that -re- is the way to go (as such, unlike -hausman-, there's no need to compare -fe- vs -re-; the test shows that -fe- is correct for this (probably misspecified) regression model

          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Thank you again Carlo. I did as you said and this was the result, but I am not sure how to interpret it. Is this model better explain with re or fe? And second, can we conclude there is an (significant) effect of the independent variables on the dependent variable even for those p-values? If not, every tip is welcome.


            . xi: xtreg FLFP EFI Puestos gdppercap agr, re robust

            Random-effects GLS regression Number of obs = 1,504
            Group variable: Country1 Number of groups = 94

            R-sq: Obs per group:
            within = 0.1229 min = 16
            between = 0.0398 avg = 16.0
            overall = 0.0416 max = 16

            I know you said p-values were not so important, but here are they (maybe useful for interpretation):
            ------------+----------------------------------------------------------------
            EFI : 0.149
            Puestos : 0.084
            gdppercap : 0.000
            agr : 0.379
            _cons : 0.000



            Wald chi2(4) = 31.25
            corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000


            sigma_u | 12.982257
            sigma_e | 1.9932807
            rho | .97696876 (fraction of variance due to u_i)
            ------------------------------------------------------------------------------


            . xtoverid

            Test of overidentifying restrictions: fixed vs random effects
            Cross-section time-series model: xtreg re robust cluster(Country1)
            Sargan-Hansen statistic 11.177 Chi-sq(4) P-value = 0.0246


            Best regards
            Last edited by Alejandro Cuadros; 14 Jan 2021, 05:02.

            Comment


            • #7
              Alejandro:
              the -xtoverid- outcome points you toward the -fe- specification (as the null is rejected).
              What I would do now is to focus on your second -xtreg,fe- model (I mean the one with robust standar errors) and test whether it is misspecified or not (we discussed this issue Yesterday).
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment

              Working...
              X