Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Wild bootstrap Fixed and Random effects model.

    Dear Statalist,
    The topic of my thesis is "How firm specific characteristics affect a firm's Cash holding".
    In order to regress the characteristics with the dependent variable Cash Holding, I would like to do a Fixed and Ramdom effects test.
    However due to a normality problem and a problem with heteroskedasticity the literature recommends to do a wild bootstrap and I have a few questions regarding that method.
    How do you do that in Stata?
    Is a cluster variable necessary and if so in which situations?
    How many replications should I do?
    When do you use the options "Supress replication dots" "Use MSE formula for variance" and "Compute for BCa Cls"?

    I hope someone has the time to help me with this.

    Thank you in advance

  • #2
    How large are your N and T? If you have many firms and not too many time periods then there is no reason to use a bootstrap of any kind.

    If you do bootstrap make sure you do use the panel bootstrap -- maybe this is what is meant by "wild bootstrap" in this context? -- and you will need a cluster variable. But nonnormality and heteroskedasticity and even serial correlation do not prevent you from simply testing RE against FE. The correlated random effects approach makes this easy.

    Comment


    • #3
      Dear Mr Wooldridge,
      First of all thank you so much for replying so quickly.
      My sample Size is 392 and the time period is 5 years (2014-2018). The dependent variable is Cash holding and the independent variables are Size, Leverage, Bank debt, Cashflow Cashflow volatility Liquid assets, Investment opportunity and Dividend payment. With my Pooled OLS regression non of the independent variables were significantly related to the dependent variable, eventhough the bulk of the literature stated that this should be the case. I have dealt with the outliers and have excluded the firms with the sic codes 6000- 6999 as is the custom. When the literature recommended a Wild Bootstrap and after trying it most of the independent variables became significant. Hence when all the variables became insignificant after doing a Random effects test, I assumed that a Wild Bootstrap was necessary again. Let's say I insist on doing a Wild bootstrap, how should I approach this? What are the criteria for this cluster variable?

      I indeed work with panel data and reagrding the Wild Bootstrap I'm following the lead of this guy: Flachaire, E. (2005). Bootstrapping heteroskedastic regression models: wild bootstrap vs. pairs bootstrap. Computational Statistics & Data Analysis, 49(2), 361-376.

      Once again thank you so much for your quick reply.

      Comment


      • #4
        Fin:
        you actually have a short panel, as N>T.
        If you detected heteroskedastcity and serial correlation problem, you can simply use clustered or robust standard errors (they do the very same job under -xtreg-), as they take both heteroskedasticity and autocorrelation into account.
        In addition, if you have a panel-wise effect (unfortunately, you do not provide any detail about that) , why using a pooled OLS as your first choice?
        As an aside, I do not think that wild bootstrap (by the way: the cluster variable should be your -panelid-) should be considered as a fix only because with the -re- specification your regressors do not reach statistical significance.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Carlo Lazzaro ,
          Thank you very much for your quick response.
          As for the panel-wise effect I'm sorry to inform you that I am not that skilled in stata so I cannot tell you if this is the case, or how to find that out.
          The pooled OLS regression was done to increase the reliability of the results since every test has its own flaws. I'm also doing a cross-sectional regression using means. Regarding my topic all of these tests are being used in the literature.

          Lets say I have defined the variables correctly and I got rid of the "noise" (outliers, sic codes etc.) and if non of the independent variables are significant, then that means that I have no choice but to accept the results, since I have done everything right, is that correct?

          Once again thank you for your quick reply
          Last edited by Fin Lane; 13 Oct 2020, 02:59.

          Comment


          • #6
            Fin:
            1) panel-wise effect: if you go -xtreg,fe- with non-default standard errors, Stata gives you back an F-test that appears as a footnote of the outcome table. It it reaches statistical significance, there's evidence of panel-wise effect, otherwise you should go pooled OLS:
            Code:
            . use "https://www.stata-press.com/data/r16/nlswork.dta"
            (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
            
            . xtreg ln_wage c.age##c.age, fe
            
            Fixed-effects (within) regression               Number of obs     =     28,510
            Group variable: idcode                          Number of groups  =      4,710
            
            R-sq:                                           Obs per group:
                 within  = 0.1087                                         min =          1
                 between = 0.1006                                         avg =        6.1
                 overall = 0.0865                                         max =         15
            
                                                            F(2,23798)        =    1451.88
            corr(u_i, Xb)  = 0.0440                         Prob > F          =     0.0000
            
            ------------------------------------------------------------------------------
                 ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                     age |   .0539076   .0028078    19.20   0.000     .0484041    .0594112
                         |
             c.age#c.age |  -.0005973   .0000465   -12.84   0.000    -.0006885   -.0005061
                         |
                   _cons |    .639913   .0408906    15.65   0.000     .5597649    .7200611
            -------------+----------------------------------------------------------------
                 sigma_u |   .4039153
                 sigma_e |  .30245467
                     rho |  .64073314   (fraction of variance due to u_i)
            ------------------------------------------------------------------------------
            F test that all u_i=0: F(4709, 23798) = 8.74                 Prob > F = 0.0000
            
            .
            With -re- specification, you should run -xttest0- after -xtreg,re-. If -xttest0- outcome reaches statistical significance, you can go -re-, otherwise go pooled OLS (by the way: when there's evidence of panel-wise effect, going -fe- or -re- depends on the outcome of -hausman- if you used default standard errors or the community-contributed module -xtoverid- if you chose non-default standard errors):
            Code:
            . xtreg ln_wage c.age##c.age, re
            
            Random-effects GLS regression                   Number of obs     =     28,510
            Group variable: idcode                          Number of groups  =      4,710
            
            R-sq:                                           Obs per group:
                 within  = 0.1087                                         min =          1
                 between = 0.1015                                         avg =        6.1
                 overall = 0.0870                                         max =         15
            
                                                            Wald chi2(2)      =    3388.51
            corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
            
            ------------------------------------------------------------------------------
                 ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                     age |   .0590339   .0027172    21.73   0.000     .0537083    .0643596
                         |
             c.age#c.age |  -.0006758   .0000451   -15.00   0.000    -.0007641   -.0005876
                         |
                   _cons |   .5479714   .0397476    13.79   0.000     .4700675    .6258752
            -------------+----------------------------------------------------------------
                 sigma_u |   .3654049
                 sigma_e |  .30245467
                     rho |  .59342665   (fraction of variance due to u_i)
            ------------------------------------------------------------------------------
            
            . xttest0
            
            Breusch and Pagan Lagrangian multiplier test for random effects
            
                    ln_wage[idcode,t] = Xb + u[idcode] + e[idcode,t]
            
                    Estimated results:
                                     |       Var     sd = sqrt(Var)
                            ---------+-----------------------------
                             ln_wage |   .2285836       .4781042
                                   e |   .0914788       .3024547
                                   u |   .1335207       .3654049
            
                    Test:   Var(u) = 0
                                         chibar2(01) = 28074.51
                                      Prob > chibar2 =   0.0000
            
            .
            2) I don't follow your statement about pooled OLS as an approach to increase the reliability (with respect to what?) of the results.
            3) Getting rid of the nuisances, means, on the reverese side of the coin, making-up your data. The risk is to end up with a maked-up sample that has little to do with the original dataset. Whether this approach can pass the muster in your research field, I cannot tell.
            4) Statistical significance is usually overestimated (and oversold in many statistical classes), as non-significant results are as informative as significant ones. Lack of statistical significance can depend on a small sample size, a misspecified model, an actual absence of any difference in the populations from which the samples were (randomly) drawn, just to mention the handful of reasons that springs to my mind at the moment.
            You do not post what you typed and what Stata gave you back (as recommended by the FAQ): hence, it is difficult (for me, at any rate) to reply more positively.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment

            Working...
            X