Heteroscedasticity vs Homoscedasticity with different measures of dependent variable

Carys Wright

Join Date: Apr 2019

Posts: 17
#1

Heteroscedasticity vs Homoscedasticity with different measures of dependent variable

24 Apr 2019, 03:29

Hi All,

I am carrying out research for my bachelor thesis looking at the effect of gin consumption on health outcomes, using a regional-level panel data set. I am using two measures of health for robustness: ARD (alcohol related deaths) and BADSAH (bad self-assessed health).

Carrying out the Breush-Pagan test for Heteroscedasticity -

1) For ARD - p-value of 0.000 (reject null - presence of heteroscedasticity)
2) For BADSAH - p-value of 0.2510 (failure to reject null, homoscedasticity)

Currently I am using , fe cluster(region) within both models.

Does this result mean for my second measure of health (badsah) - I do not need robust standard errors within my regression? I am slightly confused about the implication of this p-value.

Many thanks,
Carys Wright
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35720
#2

24 Apr 2019, 03:51

What commands are you using? You give some options but not the commands. Can you show scatter plots of

ARD versus gin consumption

BADSAH versus gin consumption

or your real regressions if different?
Comment
Carys Wright

Join Date: Apr 2019

Posts: 17
#3

24 Apr 2019, 04:26

Hi Nick,

I am using commands:

reg badsah consumptiongin TEH GDHI dbbinge
estat hettest

reg ARD consumptiongin TEH GDHI dbbinge
estat hettest

TEH - control for Total Expenditure of Healthcare
GDHI - control for income
dbbinge - control for binge drinking behaviour

I have uploaded the scatter graphs as attachments.

What do you mean my real regressions? Sorry if I haven't been clear enough!

Carys Wright

Attached Files

ScatterBADSAH.gph (6.2 KB, 1 view)

ScatterARD.gph (6.0 KB, 1 view)
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17712

24 Apr 2019, 04:29

Carys:
if you're dealing with -xtreg,fe- and want to check for heteroskedasticity, you can use the community-contributed command -xttest3-, as you can see in the following toy-example:

Code:

. use "http://www.stata-press.com/data/r15/nlswork.dta"
(National Longitudinal Survey.  Young Women 14-26 years of age in 1968)

. xtreg ln_wage age, fe rob

Fixed-effects (within) regression               Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1026                                         min =          1
     between = 0.0877                                         avg =        6.1
     overall = 0.0774                                         max =         15

                                                F(1,4709)         =     884.05
corr(u_i, Xb)  = 0.0314                         Prob > F          =     0.0000

                             (Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0181349   .0006099    29.73   0.000     .0169392    .0193306
       _cons |   1.148214   .0177153    64.81   0.000     1.113483    1.182944
-------------+----------------------------------------------------------------
     sigma_u |  .40635023
     sigma_e |  .30349389
         rho |  .64192015   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. xttest3

Modified Wald test for groupwise heteroskedasticity
in fixed effect regression model

H0: sigma(i)^2 = sigma^2 for all i

chi2 (4710)  =  1.2e+37
Prob>chi2 =      0.0000


.

Kind regards,
Carlo
(Stata 19.0)

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#5

24 Apr 2019, 04:30

Carys:
why switching to -regress- to test for heteroskedastciity?

Kind regards,
Carlo
(Stata 19.0)
Comment
Carys Wright

Join Date: Apr 2019

Posts: 17
#6

24 Apr 2019, 05:07

Hi Carlo,

I am not completely sure - I wasn't aware there were different commands for the tests when using panel data - thank you so much!

After using that command I get p-values to reject the null

One question following on from this:

Does the xt command also need to be employed when testing:
VIF
RESET

Many thanks,
Carys Wright
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35720
#7

24 Apr 2019, 05:16

So you have other predictors, which is understandable, but was not mentioned in #1.

Please see https://www.statalist.org/forums/help#stata for how to present graphs -- as .png attachments.

I copy here your graph for ARD

So, you have one group of 9 high outcome values. Some simple conclusions follow immediately.

If these are number of deaths, then you surely need to scale by some appropriate measure of population size.

If these are death rates in some sense, plain or vanilla regression hasn't a hope in heck of capturing this structure. You should be better off working on log scale.

Similar comments apply to the graph with BADSAH, except that I guess that the scale is bounded, in which case plain or vanilla regression isn't a good idea either.

Naturally coping with panel structure is important but quite a separate issue.

For assessing heteroscedasticity with several predictors I would want to see a plot of residual versus fitted far more than any test, Breusch[sic]-Pagan or otherwise.
1 like
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17712

24 Apr 2019, 05:22

Carys;
great! So you're right in invoking -robust-.
Unfortunately, -xt- suite does not support -estat vif-, nor -estat reset-.
However, the most important thing to check is not quasi-extreme multicollinearity (that you can suspect if you detect weird 95%CI in your predictor(s)), but misspecification.
With panel data you can do something along the line of -linktest-, as you can see from the following toy-example:

Code:

. use "http://www.stata-press.com/data/r15/nlswork.dta"
(National Longitudinal Survey.  Young Women 14-26 years of age in 1968)

. xtreg ln_wage age, fe rob

Fixed-effects (within) regression               Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1026                                         min =          1
     between = 0.0877                                         avg =        6.1
     overall = 0.0774                                         max =         15

                                                F(1,4709)         =     884.05
corr(u_i, Xb)  = 0.0314                         Prob > F          =     0.0000

                             (Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0181349   .0006099    29.73   0.000     .0169392    .0193306
       _cons |   1.148214   .0177153    64.81   0.000     1.113483    1.182944
-------------+----------------------------------------------------------------
     sigma_u |  .40635023
     sigma_e |  .30349389
         rho |  .64192015   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. predict u, xb
(24 missing values generated)

. g sq_u=u^2
(24 missing values generated)

. xtreg ln_wage u sq_u , fe rob

Fixed-effects (within) regression               Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1087                                         min =          1
     between = 0.1006                                         avg =        6.1
     overall = 0.0865                                         max =         15

                                                F(2,4709)         =     507.42
corr(u_i, Xb)  = 0.0440                         Prob > F          =     0.0000

                             (Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           u |   7.143466    .738485     9.67   0.000      5.69569    8.591242
        sq_u |  -1.816243   .2188485    -8.30   0.000    -2.245289   -1.387198
       _cons |  -5.167788   .6209677    -8.32   0.000    -6.385175   -3.950401
-------------+----------------------------------------------------------------
     sigma_u |   .4039153
     sigma_e |  .30245467
         rho |  .64073314   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. test sq_u

 ( 1)  sq_u = 0

       F(  1,  4709) =   68.87
            Prob > F =    0.0000

.
*The outcome of -test- tells that the model is misspecified (ie, more predictors and/or interactions are needed)*

Last edited by Carlo Lazzaro; 24 Apr 2019, 05:24.

Kind regards,
Carlo
(Stata 19.0)

Comment

Carys Wright

Join Date: Apr 2019

Posts: 17
#9

24 Apr 2019, 08:35

Nick -

I am not sure what you mean by plain/vanilla regression?

I am using a panel data set - and it is alcohol-related deaths per region (as I am not doing a regional comparison but just comparing it over the time period, I didn't think it was necessary to scale them by the population of the region).

I think Carlo has helped me with the test for heteroskedasticity for panel data (thanks Carlo again).

However - what command would I need to use to do a plot of residual versus fitted?

Many thanks,

Carys
Comment
Carys Wright

Join Date: Apr 2019

Posts: 17
#10

24 Apr 2019, 08:40

Carlo -

I have just performed the -linktest- as you showed above.

This is my result:

test sq_u

( 1) sq_u = 0

F( 1, 9) = 0.38
Prob > F = 0.5518

Does this suggest by the outcome that the model is well specified (don't need to add in more predictors/interactions) ? Also is there a formal name for this test so I can quote what I am performign in my thesis?

Many thanks!

Carys Wright
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#11

24 Apr 2019, 08:50

Carys:
1) yes, as per -test- outcome, your model does not show evidence of miispecification (ie, there's no need for other predictors and/or interactions);
2) Pregibon and Tukey (not necessarily in this order) developed this test. For full reference, please see -linktest- entry, Stata .pdf manual.

Kind regards,
Carlo
(Stata 19.0)
Comment
Carys Wright

Join Date: Apr 2019

Posts: 17
#12

24 Apr 2019, 08:58

Carlo -

Many thanks. Sorry for all the questions.

Just to clarify - would my results from RESET and VIF be wrong to use then as they do not specify a panel data set?

Carys
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#13

24 Apr 2019, 09:01

Short answer: yes.

Kind regards,
Carlo
(Stata 19.0)
Comment
Carys Wright

Join Date: Apr 2019

Posts: 17
#14

24 Apr 2019, 09:07

Great thanks.

I also had another thread regarding reverse causality - I was wondering if you knew anything about this as you are such a speedy responder and I am extremely grateful!

https://www.statalist.org/forums/for...ntrol-variable

I am aware I haven't reported the code in the correct way but when I use dataex after installing the command, I get the error code ' input statement exceeds linesize limit. Try specifying fewer variables'

Carys
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35720
#15

24 Apr 2019, 10:01

If different regions have different populations, scaling is surely essential (your health measures appear to be averages, although how health measures can vary so much between regions is a puzzle).

Try specifying fewer variables

Why not do that?

Code:

dataex ARD badsah consumptiongin TEH GDHI dbbinge region

is all that appears relevant to your thread so far. (But there is a time variable too? What did you tell xtset?)
Comment

Announcement