Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Heteroscedasticity vs Homoscedasticity with different measures of dependent variable

    Hi All,

    I am carrying out research for my bachelor thesis looking at the effect of gin consumption on health outcomes, using a regional-level panel data set. I am using two measures of health for robustness: ARD (alcohol related deaths) and BADSAH (bad self-assessed health).

    Carrying out the Breush-Pagan test for Heteroscedasticity -

    1) For ARD - p-value of 0.000 (reject null - presence of heteroscedasticity)
    2) For BADSAH - p-value of 0.2510 (failure to reject null, homoscedasticity)

    Currently I am using , fe cluster(region) within both models.

    Does this result mean for my second measure of health (badsah) - I do not need robust standard errors within my regression? I am slightly confused about the implication of this p-value.

    Many thanks,
    Carys Wright

  • #2
    What commands are you using? You give some options but not the commands. Can you show scatter plots of

    ARD versus gin consumption

    BADSAH versus gin consumption

    or your real regressions if different?

    Comment


    • #3
      Hi Nick,

      I am using commands:

      reg badsah consumptiongin TEH GDHI dbbinge
      estat hettest

      reg ARD consumptiongin TEH GDHI dbbinge
      estat hettest

      TEH - control for Total Expenditure of Healthcare
      GDHI - control for income
      dbbinge - control for binge drinking behaviour


      I have uploaded the scatter graphs as attachments.

      What do you mean my real regressions? Sorry if I haven't been clear enough!

      Carys Wright

      Attached Files

      Comment


      • #4
        Carys:
        if you're dealing with -xtreg,fe- and want to check for heteroskedasticity, you can use the community-contributed command -xttest3-, as you can see in the following toy-example:
        Code:
        . use "http://www.stata-press.com/data/r15/nlswork.dta"
        (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
        
        . xtreg ln_wage age, fe rob
        
        Fixed-effects (within) regression               Number of obs     =     28,510
        Group variable: idcode                          Number of groups  =      4,710
        
        R-sq:                                           Obs per group:
             within  = 0.1026                                         min =          1
             between = 0.0877                                         avg =        6.1
             overall = 0.0774                                         max =         15
        
                                                        F(1,4709)         =     884.05
        corr(u_i, Xb)  = 0.0314                         Prob > F          =     0.0000
        
                                     (Std. Err. adjusted for 4,710 clusters in idcode)
        ------------------------------------------------------------------------------
                     |               Robust
             ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                 age |   .0181349   .0006099    29.73   0.000     .0169392    .0193306
               _cons |   1.148214   .0177153    64.81   0.000     1.113483    1.182944
        -------------+----------------------------------------------------------------
             sigma_u |  .40635023
             sigma_e |  .30349389
                 rho |  .64192015   (fraction of variance due to u_i)
        ------------------------------------------------------------------------------
        
        . xttest3
        
        Modified Wald test for groupwise heteroskedasticity
        in fixed effect regression model
        
        H0: sigma(i)^2 = sigma^2 for all i
        
        chi2 (4710)  =  1.2e+37
        Prob>chi2 =      0.0000
        
        
        .
        Kind regards,
        Carlo
        (Stata 18.0 SE)

        Comment


        • #5
          Carys:
          why switching to -regress- to test for heteroskedastciity?
          Kind regards,
          Carlo
          (Stata 18.0 SE)

          Comment


          • #6
            Hi Carlo,

            I am not completely sure - I wasn't aware there were different commands for the tests when using panel data - thank you so much!

            After using that command I get p-values to reject the null

            One question following on from this:

            Does the xt command also need to be employed when testing:
            VIF
            RESET

            Many thanks,
            Carys Wright

            Comment


            • #7
              So you have other predictors, which is understandable, but was not mentioned in #1.

              Please see https://www.statalist.org/forums/help#stata for how to present graphs -- as .png attachments.

              I copy here your graph for ARD

              Click image for larger version

Name:	ScatterARD.png
Views:	1
Size:	28.2 KB
ID:	1494789


              So, you have one group of 9 high outcome values. Some simple conclusions follow immediately.

              If these are number of deaths, then you surely need to scale by some appropriate measure of population size.

              If these are death rates in some sense, plain or vanilla regression hasn't a hope in heck of capturing this structure. You should be better off working on log scale.

              Similar comments apply to the graph with BADSAH, except that I guess that the scale is bounded, in which case plain or vanilla regression isn't a good idea either.

              Naturally coping with panel structure is important but quite a separate issue.

              For assessing heteroscedasticity with several predictors I would want to see a plot of residual versus fitted far more than any test, Breusch[sic]-Pagan or otherwise.

              Comment


              • #8
                Carys;
                great! So you're right in invoking -robust-.
                Unfortunately, -xt- suite does not support -estat vif-, nor -estat reset-.
                However, the most important thing to check is not quasi-extreme multicollinearity (that you can suspect if you detect weird 95%CI in your predictor(s)), but misspecification.
                With panel data you can do something along the line of -linktest-, as you can see from the following toy-example:
                Code:
                . use "http://www.stata-press.com/data/r15/nlswork.dta"
                (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
                
                . xtreg ln_wage age, fe rob
                
                Fixed-effects (within) regression               Number of obs     =     28,510
                Group variable: idcode                          Number of groups  =      4,710
                
                R-sq:                                           Obs per group:
                     within  = 0.1026                                         min =          1
                     between = 0.0877                                         avg =        6.1
                     overall = 0.0774                                         max =         15
                
                                                                F(1,4709)         =     884.05
                corr(u_i, Xb)  = 0.0314                         Prob > F          =     0.0000
                
                                             (Std. Err. adjusted for 4,710 clusters in idcode)
                ------------------------------------------------------------------------------
                             |               Robust
                     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                         age |   .0181349   .0006099    29.73   0.000     .0169392    .0193306
                       _cons |   1.148214   .0177153    64.81   0.000     1.113483    1.182944
                -------------+----------------------------------------------------------------
                     sigma_u |  .40635023
                     sigma_e |  .30349389
                         rho |  .64192015   (fraction of variance due to u_i)
                ------------------------------------------------------------------------------
                
                . predict u, xb
                (24 missing values generated)
                
                . g sq_u=u^2
                (24 missing values generated)
                
                . xtreg ln_wage u sq_u , fe rob
                
                Fixed-effects (within) regression               Number of obs     =     28,510
                Group variable: idcode                          Number of groups  =      4,710
                
                R-sq:                                           Obs per group:
                     within  = 0.1087                                         min =          1
                     between = 0.1006                                         avg =        6.1
                     overall = 0.0865                                         max =         15
                
                                                                F(2,4709)         =     507.42
                corr(u_i, Xb)  = 0.0440                         Prob > F          =     0.0000
                
                                             (Std. Err. adjusted for 4,710 clusters in idcode)
                ------------------------------------------------------------------------------
                             |               Robust
                     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                           u |   7.143466    .738485     9.67   0.000      5.69569    8.591242
                        sq_u |  -1.816243   .2188485    -8.30   0.000    -2.245289   -1.387198
                       _cons |  -5.167788   .6209677    -8.32   0.000    -6.385175   -3.950401
                -------------+----------------------------------------------------------------
                     sigma_u |   .4039153
                     sigma_e |  .30245467
                         rho |  .64073314   (fraction of variance due to u_i)
                ------------------------------------------------------------------------------
                
                . test sq_u
                
                 ( 1)  sq_u = 0
                
                       F(  1,  4709) =   68.87
                            Prob > F =    0.0000
                
                .
                *The outcome of -test- tells that the model is misspecified (ie, more predictors and/or interactions are needed)*
                Last edited by Carlo Lazzaro; 24 Apr 2019, 05:24.
                Kind regards,
                Carlo
                (Stata 18.0 SE)

                Comment


                • #9
                  Nick -

                  I am not sure what you mean by plain/vanilla regression?

                  I am using a panel data set - and it is alcohol-related deaths per region (as I am not doing a regional comparison but just comparing it over the time period, I didn't think it was necessary to scale them by the population of the region).

                  I think Carlo has helped me with the test for heteroskedasticity for panel data (thanks Carlo again).

                  However - what command would I need to use to do a plot of residual versus fitted?

                  Many thanks,

                  Carys

                  Comment


                  • #10
                    Carlo -

                    I have just performed the -linktest- as you showed above.

                    This is my result:

                    test sq_u

                    ( 1) sq_u = 0

                    F( 1, 9) = 0.38
                    Prob > F = 0.5518

                    Does this suggest by the outcome that the model is well specified (don't need to add in more predictors/interactions) ? Also is there a formal name for this test so I can quote what I am performign in my thesis?

                    Many thanks!

                    Carys Wright

                    Comment


                    • #11
                      Carys:
                      1) yes, as per -test- outcome, your model does not show evidence of miispecification (ie, there's no need for other predictors and/or interactions);
                      2) Pregibon and Tukey (not necessarily in this order) developed this test. For full reference, please see -linktest- entry, Stata .pdf manual.
                      Kind regards,
                      Carlo
                      (Stata 18.0 SE)

                      Comment


                      • #12
                        Carlo -

                        Many thanks. Sorry for all the questions.

                        Just to clarify - would my results from RESET and VIF be wrong to use then as they do not specify a panel data set?

                        Carys

                        Comment


                        • #13
                          Short answer: yes.
                          Kind regards,
                          Carlo
                          (Stata 18.0 SE)

                          Comment


                          • #14
                            Great thanks.

                            I also had another thread regarding reverse causality - I was wondering if you knew anything about this as you are such a speedy responder and I am extremely grateful!

                            https://www.statalist.org/forums/for...ntrol-variable

                            I am aware I haven't reported the code in the correct way but when I use dataex after installing the command, I get the error code ' input statement exceeds linesize limit. Try specifying fewer variables'

                            Carys

                            Comment


                            • #15
                              If different regions have different populations, scaling is surely essential (your health measures appear to be averages, although how health measures can vary so much between regions is a puzzle).

                              Try specifying fewer variables
                              Why not do that?
                              Code:
                              dataex ARD  badsah consumptiongin TEH GDHI dbbinge region
                              is all that appears relevant to your thread so far. (But there is a time variable too? What did you tell xtset?)

                              Comment

                              Working...
                              X