ANOVA without homoscedasticity

Ignacio Herrando

Join Date: Sep 2023

Posts: 10
#1

ANOVA without homoscedasticity

04 Sep 2023, 16:01

I am using Stata for my analysis. I am analysing a numerical variable over a categorical one (with 3 groups). After confirming they are normally distributed, I did a one way ANOVA test. But Bartlett's test showed me that the variances are not equal. What should I do next? Do a Kruskal-Wallis test? Should I perform a multiple comparison test (Bonferroni) to confirm the association obtained frmm the one-way ANOVA test? Should I perform additional tests to the ANOVA one?
This is how my data looks like:
. oneway CH1_taumean TumorLocation if TissueType==1, bonferroni tabulate

| Summary of CH1_tau mean
Location | Mean Std. dev. Freq.
------------+------------------------------------
Right | 3.0568408 .35276463 18
Left | 3.3229682 .38533265 30
Rectum | 3.1543516 .67311474 15
------------+------------------------------------
Total | 3.206785 .46862096 63

Analysis of variance
Source SS df MS F Prob > F
------------------------------------------------------------------------
Between groups .850894138 2 .425447069 2.00 0.1443
Within groups 12.7646535 60 .212744225
------------------------------------------------------------------------
Total 13.6155476 62 .219605607

Bartlett's equal-variances test: chi2(2) = 8.7458 Prob>chi2 = 0.013

Comparison of CH1_tau mean by Location
(Bonferroni)
Row Mean-|
Col Mean | Right Left
---------+----------------------
Left | .266127
| 0.173
|
Rectum | .097511 -.168617
| 1.000 0.757
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35727
#2

04 Sep 2023, 16:19

At https://stackoverflow.com/questions/...-i-do-if-a-sam I advised you to post on Statalist, so fine.

As said there, we need to see your data to advise.

Please show the results of

Code:

dataex CH1_taumean TumorLocation if TissueType==1

https://www.statalist.org/forums/help#stata explains in more detail.
Comment

Ignacio Herrando

Join Date: Sep 2023
Posts: 10

04 Sep 2023, 18:09

Here it goes.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input double CH1_taumean long TumorLocation
2.89365437369505 2
 3.2339901485086 1
      3.89115556 2
2.48569094195189 1
2.79875860178849 1
 3.3439395171991 2
3.29836102465565 2
2.99892809206288 1
2.89062883924974 2
2.86831556339548 3
2.87663643890724 2
2.56982853166631 3
2.80865167472012 3
2.93232093905943 2
2.92521660823454 1
3.40261016825649 3
3.15550903733131 2
 2.5419254448238 2
3.13033745542194 3
3.16950349811181 2
2.80552579759904 2
2.78470944280644 3
2.53334226918893 3
2.79441866340829 3
2.98540593088585 3
3.43347589807309 3
3.26779184761215 3
2.44574954209028 1
3.33459259180109 1
2.96856542750294 2
3.10915982394307 3
3.23643969635452 3
3.52395138868319 2
2.94750983676282 1
3.14811054665368 1
2.97941259729288 2
3.01257957940603 1
2.84136418181104 3
3.40103695770559 3
3.43460735167242 2
3.09977803254538 3
2.98952762691712 2
3.23932939699073 1
3.96758578776324 2
1.86027301378888 3
3.64997493914638 2
3.67540089857714 2
3.05440283405953 3
3.17313630560935 2
3.14910187132317 1
3.49722225428298 2
3.04592812898681 2
 3.5768034135925 3
3.85983242258081 2
3.46065897894648 1
3.59349065621267 3
4.72811626329215 3
2.83871003455609 1
3.20353077701783 1
3.97359499130923 3
3.34227524711743 2
2.54095749114357 1
3.65233152439339 1
3.25263012626406 3
3.93880499034933 3
3.29597181798237 2
3.74574741912157 2
3.83588168230297 2
3.78643001805071 2
3.60738738576705 1
3.76919100753992 2
3.60319510012945 3
3.12660684169344 3
3.34924456287799 2
end
label values TumorLocation TumorLocation
label def TumorLocation 1 "Right", modify
label def TumorLocation 2 "Left", modify
label def TumorLocation 3 "Rectum", modify

Comment

Joseph Coveney

Join Date: Apr 2014
Posts: 4422

04 Sep 2023, 20:44

Originally posted by Ignacio Herrando View Post

. . . the variances are not equal. What should I do next?

Model them.

Code:

rename (CH1_taumean TumorLocation ) (tau loc)

assert inlist(loc, 1, 2, 3)
label define Locations 1 Right 2 Left 3 Rectum
label values loc Locations

mixed tau i.loc, residuals(independent, by(loc)) reml dfmethod(anova) nolrtest nolog

contrast r.loc, mcompare(bonferroni) df(`e(ddf_m)') noeffects

exit

Variable names shortened for legibility.

Comment

Joseph Coveney

Join Date: Apr 2014

Posts: 4422
#5

04 Sep 2023, 22:06

Originally posted by Ignacio Herrando View Post

Should I perform additional tests to the ANOVA one?

Yeah, I forgot to mention it above: you should plot the data in order to get some perspective.

Code:

dotplot tau, over(loc) mean center bar /// scheme(s2color) /// mcolor(black none none black) msize(vsmall) /// ylabel( , angle(horizontal) nogrid) ytitle(Tau) /// xtitle(Location)
2 likes
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17713

04 Sep 2023, 23:33

Ignacio:
why not going -regress- instead?

Code:

. reg CH1_taumean i.TumorLocation

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(2, 71)        =      2.07
       Model |  .813865745         2  .406932873   Prob > F        =    0.1339
    Residual |  13.9639576        71   .19667546   R-squared       =    0.0551
-------------+----------------------------------   Adj R-squared   =    0.0285
       Total |  14.7778234        73  .202435937   Root MSE        =    .44348

-------------------------------------------------------------------------------
  CH1_taumean | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
--------------+----------------------------------------------------------------
TumorLocation |
        Left  |   .2661275   .1322205     2.01   0.048     .0024873    .5297677
      Rectum  |   .1344894   .1359811     0.99   0.326    -.1366492     .405628
              |
        _cons |   3.056841   .1045295    29.24   0.000     2.848415    3.265267
-------------------------------------------------------------------------------

. estat hettest

Breusch–Pagan/Cook–Weisberg test for heteroskedasticity 
Assumption: Normal error terms
Variable: Fitted values of CH1_taumean

H0: Constant variance

    chi2(1) =   0.00
Prob > chi2 = 0.9798

.

As per your data excerpt (that seems to lack of the -if- clause-related variable), the outcome of -estat hettest- does not show any evidence of heteroskedasticity.
If that were not the case with your original sample, just invoke -robust-.
I remind to myself first that, most of the time (that is, exception made for repeated anova) there's nothing that -anova- can do that -regress- cannot do better.

Kind regards,
Carlo
(Stata 19.0)

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35727
#7

05 Sep 2023, 02:22

My tuppenceworth is another graph. Here I use stripplot from SSC. The longer horizontal lines show means. The Tufte-style box plots just show medians, quartiles and extremes. There is no call for further detail in the box plots as the data are all plotted any way.

The graph is clearly similar in general drift to the dotplot from Joseph Coveney in #5.

Sometimes with a hint of heteroscedasticity, there is an obvious answer to consider working with logarithms, but not here.

Despite the indications you'll get from some model fits, I would distrust the higher mean for Left without an independent oncological rationale.

Code:

stripplot CH, over(Tumor) tufte cumul cumprob vertical refline center boffset(-0.4) height(0.6) xla(, tlc(none))
1 like
Comment
Ignacio Herrando

Join Date: Sep 2023

Posts: 10
#8

05 Sep 2023, 07:55

Thank you all...
I will try your suggestions.
This is just a piece of the dataset and analysis.
Comment
Erik Ruzek

Join Date: Oct 2017

Posts: 433
#9

24 Apr 2024, 09:28

I ran across this thread and appreciated all the answers. Another option to consider, and as inspired by a Cross-Validated post, is to use bootstrapping to resample within the locations:

Code:

bootstrap, reps(1000) strata(loc): reg tau i.loc

This ends up producing standard errors that are quite similar to those that come from the approach proposed by Joseph Coveney that models the unequal variances directly.
1 like
Comment

Announcement