Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Robust standard errors - OLS - Right-skewed distribution

    Hi,

    I am looking at data for returns of companies after their initial public offering. The return-data is skewed to the right and my White's test for heteroskedasticity suggests that it is highly heteroskedastic and skewed (See picture below).

    I have tried using robust standard errors in my regression (regress x y, robust) which alters my results. However, I do not know what the robust standard errors take into account.

    Do you know if they account for the skewness of the distribution? If no, is there any way to take this into account when making statistical inference?

    I hope you can help!

    Best,
    Rolf

    .
    Click image for larger version

Name:	Skærmbillede 2019-10-16 kl. 13.26.31.png
Views:	1
Size:	27.4 KB
ID:	1520651

  • #2

    If you show us the results of

    Code:
    scatter x y
    then we would have a picture of what is going on. It seems more likely to me that you should reconsider your model functional form than that anything much will be solved by getting different standard errors. In any case, it is the conditional distributions that matter, not the marginal distribution.

    Note: conventionally the response or outcome is called y and the predictor x. I am just echoing what you say you used as regress syntax.

    Comment


    • #3
      Thx for the quick response!

      Return is the dependent variable (endogenous) and I have, including dummies, 22 explanatory variables (exogenous).

      These are my plotted, non-robust, standard errors generated with the code:

      redict resid, r

      predict yhat, xb

      scatter yhat resid


      Click image for larger version

Name:	Skærmbillede 2019-10-16 kl. 13.50.46.png
Views:	1
Size:	34.3 KB
ID:	1520656


      Does this make any sense?
      Attached Files

      Comment


      • #4
        So the mention of regress x y, robust was nonsensical, or at least not to be taken literally. How were we supposed to know?

        It's conventional to plot residuals versus fitted, not as you have it here. Either way, a constraint on the response is a constraint on the configuration of your plot.

        It seems that your regression takes no account whatever of the bounded character of your response. Fitting a hyperplane looks suspect to me in that situation, but it is hard to give more constructive advice. Again, the most that a robust option can do is give, as it were, more honest standard errors and perhaps less misleading tests and confidence intervals. it can't correct a dubious model.

        Comment


        • #5
          Sorry for my poor understanding of the topic - I appreciate you taking the time!

          I have tried to be very specific about my steps in the code. I know for a fact that my dependent variable, "Threeyearreturn", is skewed towards the rights, but naively choose to use OLS, as I do not know what model I can use instead - perhaps you do?

          Below you can find my output

          Continuing with OLS;
          I start out with a regression using non-robust standard errors and test for heteroskedasticity which is rejected across multiple tests. Therefore, I use a regression with robust standard errors. The variable "PEBacked", which is the one of interest, goes from being significantly different from 0 at a 5-percent level (using a t-test), to not being significantly different from 0 in the regression using robust standard errors.

          Do you have any recommendation on how to circumvent this by using a different functional form etc?


          Click image for larger version

Name:	Skærmbillede 2019-10-16 kl. 14.24.22.png
Views:	1
Size:	140.5 KB
ID:	1520663

          Click image for larger version

Name:	Skærmbillede 2019-10-16 kl. 14.24.34.png
Views:	2
Size:	59.8 KB
ID:	1520666

          Click image for larger version

Name:	Skærmbillede 2019-10-16 kl. 14.24.47.png
Views:	1
Size:	126.0 KB
ID:	1520665
          Attached Files

          Comment


          • #6
            your situation is still not clear to me (and you should read the FAQ as much of your posting is unreadable); I suggest you see the following Stata blog:

            https://blog.stata.com/2011/08/22/us...tell-a-friend/

            Comment


            • #7
              Hi Rolf
              As you have observed from your results, and as Nick Cox already mentioned, using Robust standard errors simply recalculates the standard errors of the estimated coefficients using a more conservative estimation of the standard errors under the assumption of heteroskedasticity. On the other hand, robust standard errors are only asymptotically.
              Since you have 260obs in your model, robust standard errors may not be the best approach.
              Perhaps something you can try Weighted Least Squares. (You can find the explanation for this in chapter 8 from Introductory Econometrics: A modern approach by Wooldridge)
              HTH


              Comment


              • #8
                Thanks for the response. I will look into WLS.

                Have a great day!

                Comment


                • #9
                  Nick raised the question about whether this was the correct functional form. I'm asking this naively since I don't know any of the theory on which this is based. But is the original distribution zero-limited with a lot of observations at or near the lower limit? And conceptually, would you expect the importance of a 1 unit change in return (your outcome) to be equal across the range of return? E.g., is the difference between 0 and 1 of equal importance to the difference between 90 and 91, for example. A tangible example would be income: The tangible difference between $20k, and $21k is likely much more meaningful than the difference between $100k, and $101k. If not, then you probably should be considering a different model.

                  Comment


                  • #10
                    There must be a literature on this, with hundreds if not thousands of papers on returns as the outcome of interest. I have no idea what that literature is -- I am, or used to be, a geomorphologist, although I strayed. But surely a researcher should be looking at it.

                    But if returns are bounded below by -1 and not bounded above, then I would expect a suitable link function to be log(1 + return).

                    Comment


                    • #11
                      I would also suggest that it sure looks like predicted values and residuals are associated which raises issues about the estimator. As Nick points out, this may come from not being able to lose more your full investment but I suspect it may be more than that. A different functional form is one option. A tobit type model is another.

                      Comment


                      • #12
                        Rolf:
                        as an aside to previous excellent points, I would also considering a more parsimonious regression model: 20 predictors with 266 observations sounds like torturing your data.
                        Kind regards,
                        Carlo
                        (Stata 18.0 SE)

                        Comment

                        Working...
                        X