Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • correcting heteroskedasticity but obtaining not normally distributed residuals

    Hi everyone,

    I am working with panel data. I have aproximately 200 observations in total. My time variable is birth decade (from 1880 until 1960) and my cross-sectional variable is province of birth.

    My dependent variable is years of schooling and my main explanatory var is migrant share. I have been trying to correct the not normally distributed errors (I already created transformation of the variables) - I solved the problem with the residuals

    regression:
    xi: reg yrsc i.region i.bdec logmigrantshare logurbanshare gapmalefemale cattle

    Then I wanted to correct heteroskedasticity, so I transformed yrsc to lnyrsc, heteroskedasticity was gone but the residuals are again not normally distributed

    At this point I do not know what is better, if having heteroskedasticity or not normally distributed res. The tests that I used were estat imtest, white and sktest for the residuals



    Could someone help me with this problem?
    Thanks in advance




  • #2
    Welcome to the Stata Forum/Statalist.

    To start, unless you're using an old Stata version, you may get rid of the "xi: " prefix. Use factor notation instead.

    With regards to the model, since the DV conveys years of schooling, you may prefer a Poisson or Negative binomial model. You may do this under a panel data as well.
    Best regards,

    Marcos

    Comment


    • #3
      Ana Maria:
      as an aside to Marcos' helpful insight, please note that non-normality in OLS residual distribution is rarely a concern, as normality can be relaxed via aympotic theory.
      Heteroskedasticity (if it is not a warning for model misspecification) can be handled with robust or clustered standard errors in panel data setting (by the way, as Marcos wisely suggested, take a look at the Stata suite of commands for panel data regression prefixed with -xt-.
      Eventually, if, as Marcos skillfully surmised, your dependent variable takes on positive integer values only, you should consider -xtpoissson- or -xtnbreg-, if -xtpoisson- is overdispersed.
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Dear Marcos und Carlo,

        Thank you so much for your suggestion.
        I will review the commands suitable for my stata version (15).

        Regarding the suggestion of Carlo to use robust or clustered standard errors, I tried it but it did not change the outcome of heteroskedasticity. I do not know if I am doing something wrong but it does not get rid of the problem.



        xi: reg yrsc i.bdec logmigrantshare logurbanshare femaleshare cattle, vce(robust)

        The result of white´s test

        White's test for Ho: homoskedasticity
        against Ha: unrestricted heteroskedasticity

        chi2(53) = 105.74
        Prob > chi2 = 0.0000

        Cameron & Trivedi's decomposition of IM-test

        ---------------------------------------------------
        Source | chi2 df p
        ---------------------+-----------------------------
        Heteroskedasticity | 105.74 53 0.0000
        Skewness | 14.74 12 0.2560
        Kurtosis | 4.06 1 0.0439
        ---------------------+-----------------------------
        Total | 124.54 66 0.0000


        Comment


        • #5
          Hi Marcos,

          I also already ran a log-log model and with this heteroskedasticity is gone. But I still have the problem of normality. So in this case I guess I will relax that assumption.

          xi: reg lnyrsc i.region i.bdec logmigrantshare logurbanshare gapmalefemale cattle

          White's test for Ho: homoskedasticity
          against Ha: unrestricted heteroskedasticity

          chi2(77) = 86.31
          Prob > chi2 = 0.2190

          Cameron & Trivedi's decomposition of IM-test

          ---------------------------------------------------
          Source | chi2 df p
          ---------------------+-----------------------------
          Heteroskedasticity | 86.31 77 0.2190
          Skewness | 25.48 14 0.0301
          Kurtosis | 3.84 1 0.0499
          ---------------------+-----------------------------
          Total | 115.64 92 0.0484
          ---------------------------------------------------




          Comment


          • #6
            Hello Ana,

            To start, since you have Stata 15 and adopted factor notation, you may just "forget" the use of the "xi:" prefix. If you still feel unconfortable with this, just try the command for yourself and check that the result are the same.

            That said, your results with - regress - shall not be the same with a Poisson-family model.

            What is more, you underlined in your first post that the study has a panel data structure. Being this so, you should follow Carlo's advice in #3, which, in short, points to a "xt" prefix.

            To sum up, I believe that trying to get rid of a non-normal distribution is not the real problem. Rather, I recommend to think about a) the type of DV variable, which seems to behave like a count data; b) the panel data structure, which will provide the overarching approach for, well, a longitudinal study.

            Hopefully that helps.
            Best regards,

            Marcos

            Comment


            • #7
              Robust standard errors at most give a more honest summary of residual variability. They don't change the parameter estimates, so the residuals remain as they were before. Personally I think most of these tests are over-rated. I'd rather look at a plot of residuals versus fitted and one of observed versus fitted. Can the model be improved? is a better question than Is the model acceptable?

              As suggested by others I'd see Poisson or related models as more natural than vanilla regression, even with a log transformation.

              Comment


              • #8
                Ana Maria:
                re-testing heteroskedastcity after robustified/clustered standard error have been imposed is not helpful if, as it happens, the test use the original residual distribution (ie, it ignores that you took heteroskedasticity into account via non-default standard errors):
                Code:
                use "C:\Program Files (x86)\Stata15\ado\base\a\auto.dta" 
                . reg price mpg
                
                      Source |       SS           df       MS      Number of obs   =        74
                -------------+----------------------------------   F(1, 72)        =     20.26
                       Model |   139449474         1   139449474   Prob > F        =    0.0000
                    Residual |   495615923        72  6883554.48   R-squared       =    0.2196
                -------------+----------------------------------   Adj R-squared   =    0.2087
                       Total |   635065396        73  8699525.97   Root MSE        =    2623.7
                
                ------------------------------------------------------------------------------
                       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                         mpg |  -238.8943   53.07669    -4.50   0.000    -344.7008   -133.0879
                       _cons |   11253.06   1170.813     9.61   0.000     8919.088    13587.03
                ------------------------------------------------------------------------------
                
                . estat hettest
                
                Breusch-Pagan / Cook-Weisberg test for heteroskedasticity 
                         Ho: Constant variance
                         Variables: fitted values of price
                
                         chi2(1)      =     7.14
                         Prob > chi2  =   0.0075
                . predict resi1, res
                
                . reg price mpg, rob
                
                Linear regression                               Number of obs     =         74
                                                                F(1, 72)          =      17.28
                                                                Prob > F          =     0.0001
                                                                R-squared         =     0.2196
                                                                Root MSE          =     2623.7
                
                ------------------------------------------------------------------------------
                             |               Robust
                       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                         mpg |  -238.8943   57.47701    -4.16   0.000    -353.4727    -124.316
                       _cons |   11253.06   1376.393     8.18   0.000     8509.272    13996.85
                ------------------------------------------------------------------------------
                
                . predict resi2, res
                
                . list resi* in 1/5
                
                     +-----------------------+
                     |     resi1       resi2 |
                     |-----------------------|
                  1. | -1898.385   -1898.385 |
                  2. | -2442.857   -2442.857 |
                  3. | -2198.385   -2198.385 |
                  4. | -1659.174   -1659.174 |
                  5. |  157.3545    157.3545 |
                     +-----------------------+
                
                .
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  Hello everyone,

                  I would like to thank all of you for your help. I will try all of your suggestions.

                  It is really amazing to have people helping you with programming questions when you have almost no previous experience in the field :D

                  Comment

                  Working...
                  X