Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Heteroskedasticity

    Dear all
    I am trying to eliminate the problem of Heteroskedasticity in my regression model. I have used Log, WLS, FGLS methods. None of the helped. So I used the robust standard error method. However, the standard errors I got from this method were almost the same as the original standard errors. Can I assume that the robust standard error method eliminated heteroskedasticity in my model?

    . reg an_spending income age gender edu pur_freq gender_edu

    Source | SS df MS Number of obs = 978
    -------------+------------------------------ F( 6, 971) = 2059.19
    Model | 2.4233e+10 6 4.0388e+09 Prob > F = 0.0000
    Residual | 1.9045e+09 971 1961353.99 R-squared = 0.9271
    -------------+------------------------------ Adj R-squared = 0.9267
    Total | 2.6137e+10 977 26752609.4 Root MSE = 1400.5

    ------------------------------------------------------------------------------
    an_spending | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    income | .0247537 .00194 12.76 0.000 .0209467 .0285608
    age | 81.73533 3.261323 25.06 0.000 75.33528 88.13538
    gender | -1273.963 123.972 -10.28 0.000 -1517.247 -1030.679
    edu | 2267.413 127.8328 17.74 0.000 2016.553 2518.273
    pur_freq | 16919.65 160.1966 105.62 0.000 16605.27 17234.02
    gender_edu | 389.4784 179.4746 2.17 0.030 37.27553 741.6812
    _cons | -5280.67 220.816 -23.91 0.000 -5714.001 -4847.338
    ------------------------------------------------------------------------------

    reg an_spending income age gender edu pur_freq gender_edu, robust

    Linear regression Number of obs = 978
    F( 6, 971) = 1731.81
    Prob > F = 0.0000
    R-squared = 0.9271
    Root MSE = 1400.5

    ------------------------------------------------------------------------------
    | Robust
    an_spending | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    income | .0247537 .0020335 12.17 0.000 .0207632 .0287442
    age | 81.73533 3.534058 23.13 0.000 74.80006 88.6706
    gender | -1273.963 127.4425 -10.00 0.000 -1524.057 -1023.868
    edu | 2267.413 125.6639 18.04 0.000 2020.809 2514.017
    pur_freq | 16919.65 194.0594 87.19 0.000 16538.82 17300.47
    gender_edu | 389.4784 179.1607 2.17 0.030 37.89161 741.0651
    _cons | -5280.67 222.5665 -23.73 0.000 -5717.437 -4843.903
    ------------------------------------------------------------------------------

  • #2
    I don't think you can ever eliminate heteroscedasticity unless you manage to do that by transforming the data. What you can do is choose a model and/or estimation procedure suitable for your data.

    The similarity of robust and conventional standard errors is no doubt comforting, but it's impossible to tell from these results whether you have done as much as you can in working towards a suitable model. For one, whether a linear functional form y = Xb is a good idea needs some independent checks. I've often found that added variable plots are helpful.

    Comment


    • #3
      Here is my handout on hetero:

      https://www3.nd.edu/~rwilliam/stats2/l25.pdf

      As Nick suggests, what tests suggest is hetero may actually be a problem with model specification. Variables may need to be transformed, or variables (e.g. interaction terms) may need to be added. Using robust standard errors may seem like a nice quick solution, but it isn’t necessarily the correct one.
      -------------------------------------------
      Richard Williams, Notre Dame Dept of Sociology
      StataNow Version: 19.5 MP (2 processor)

      EMAIL: [email protected]
      WWW: https://academicweb.nd.edu/~rwilliam/

      Comment


      • #4
        Bekhruz:
        welcome to this forum.
        As an aside to previous excellent advice:
        1) have you already check what happen if you add -age- square (by the way, why creating interaction by hand when you can rely on -fvvarlist- notation?):
        Code:
        reg an_spending income c.age##c.age gender edu pur_freq i.gender##i.edu
        ;
        2) with a 978 sample size, I would check whether -vce(cluster idcode)- is the way to go. If you detect heteroskedasticity and autocorrelation of the epsilon, -vce(cluster idcode)- rules (see https://www.stata.com/bookstore/envi...s-using-stata/ pages 28-30);
        3) your sky-rocketing R2 casts some other doubts on the correct specification of your regression model.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Thank you very much. I will check the specification of my model

          Comment

          Working...
          X