Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • ANOVA without homoscedasticity

    I am using Stata for my analysis. I am analysing a numerical variable over a categorical one (with 3 groups). After confirming they are normally distributed, I did a one way ANOVA test. But Bartlett's test showed me that the variances are not equal. What should I do next? Do a Kruskal-Wallis test? Should I perform a multiple comparison test (Bonferroni) to confirm the association obtained frmm the one-way ANOVA test? Should I perform additional tests to the ANOVA one?
    This is how my data looks like:
    . oneway CH1_taumean TumorLocation if TissueType==1, bonferroni tabulate

    | Summary of CH1_tau mean
    Location | Mean Std. dev. Freq.
    ------------+------------------------------------
    Right | 3.0568408 .35276463 18
    Left | 3.3229682 .38533265 30
    Rectum | 3.1543516 .67311474 15
    ------------+------------------------------------
    Total | 3.206785 .46862096 63

    Analysis of variance
    Source SS df MS F Prob > F
    ------------------------------------------------------------------------
    Between groups .850894138 2 .425447069 2.00 0.1443
    Within groups 12.7646535 60 .212744225
    ------------------------------------------------------------------------
    Total 13.6155476 62 .219605607

    Bartlett's equal-variances test: chi2(2) = 8.7458 Prob>chi2 = 0.013

    Comparison of CH1_tau mean by Location
    (Bonferroni)
    Row Mean-|
    Col Mean | Right Left
    ---------+----------------------
    Left | .266127
    | 0.173
    |
    Rectum | .097511 -.168617
    | 1.000 0.757

  • #2
    At https://stackoverflow.com/questions/...-i-do-if-a-sam I advised you to post on Statalist, so fine.

    As said there, we need to see your data to advise.

    Please show the results of

    Code:
    ​​​​​​​dataex CH1_taumean TumorLocation if TissueType==1
    https://www.statalist.org/forums/help#stata explains in more detail.

    Comment


    • #3
      Here it goes.
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input double CH1_taumean long TumorLocation
      2.89365437369505 2
       3.2339901485086 1
            3.89115556 2
      2.48569094195189 1
      2.79875860178849 1
       3.3439395171991 2
      3.29836102465565 2
      2.99892809206288 1
      2.89062883924974 2
      2.86831556339548 3
      2.87663643890724 2
      2.56982853166631 3
      2.80865167472012 3
      2.93232093905943 2
      2.92521660823454 1
      3.40261016825649 3
      3.15550903733131 2
       2.5419254448238 2
      3.13033745542194 3
      3.16950349811181 2
      2.80552579759904 2
      2.78470944280644 3
      2.53334226918893 3
      2.79441866340829 3
      2.98540593088585 3
      3.43347589807309 3
      3.26779184761215 3
      2.44574954209028 1
      3.33459259180109 1
      2.96856542750294 2
      3.10915982394307 3
      3.23643969635452 3
      3.52395138868319 2
      2.94750983676282 1
      3.14811054665368 1
      2.97941259729288 2
      3.01257957940603 1
      2.84136418181104 3
      3.40103695770559 3
      3.43460735167242 2
      3.09977803254538 3
      2.98952762691712 2
      3.23932939699073 1
      3.96758578776324 2
      1.86027301378888 3
      3.64997493914638 2
      3.67540089857714 2
      3.05440283405953 3
      3.17313630560935 2
      3.14910187132317 1
      3.49722225428298 2
      3.04592812898681 2
       3.5768034135925 3
      3.85983242258081 2
      3.46065897894648 1
      3.59349065621267 3
      4.72811626329215 3
      2.83871003455609 1
      3.20353077701783 1
      3.97359499130923 3
      3.34227524711743 2
      2.54095749114357 1
      3.65233152439339 1
      3.25263012626406 3
      3.93880499034933 3
      3.29597181798237 2
      3.74574741912157 2
      3.83588168230297 2
      3.78643001805071 2
      3.60738738576705 1
      3.76919100753992 2
      3.60319510012945 3
      3.12660684169344 3
      3.34924456287799 2
      end
      label values TumorLocation TumorLocation
      label def TumorLocation 1 "Right", modify
      label def TumorLocation 2 "Left", modify
      label def TumorLocation 3 "Rectum", modify

      Comment


      • #4
        Originally posted by Ignacio Herrando View Post
        . . . the variances are not equal. What should I do next?
        Model them.
        Code:
        rename (CH1_taumean TumorLocation ) (tau loc)
        
        assert inlist(loc, 1, 2, 3)
        label define Locations 1 Right 2 Left 3 Rectum
        label values loc Locations
        
        mixed tau i.loc, residuals(independent, by(loc)) reml dfmethod(anova) nolrtest nolog
        
        contrast r.loc, mcompare(bonferroni) df(`e(ddf_m)') noeffects
        
        exit
        Variable names shortened for legibility.

        Comment


        • #5
          Originally posted by Ignacio Herrando View Post
          Should I perform additional tests to the ANOVA one?
          Yeah, I forgot to mention it above: you should plot the data in order to get some perspective.
          Code:
          dotplot tau, over(loc) mean center bar ///
              scheme(s2color) ///
              mcolor(black none none black) msize(vsmall) ///
              ylabel( , angle(horizontal) nogrid) ytitle(Tau) ///
              xtitle(Location)

          Comment


          • #6
            Ignacio:
            why not going -regress- instead?
            Code:
            . reg CH1_taumean i.TumorLocation
            
                  Source |       SS           df       MS      Number of obs   =        74
            -------------+----------------------------------   F(2, 71)        =      2.07
                   Model |  .813865745         2  .406932873   Prob > F        =    0.1339
                Residual |  13.9639576        71   .19667546   R-squared       =    0.0551
            -------------+----------------------------------   Adj R-squared   =    0.0285
                   Total |  14.7778234        73  .202435937   Root MSE        =    .44348
            
            -------------------------------------------------------------------------------
              CH1_taumean | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
            --------------+----------------------------------------------------------------
            TumorLocation |
                    Left  |   .2661275   .1322205     2.01   0.048     .0024873    .5297677
                  Rectum  |   .1344894   .1359811     0.99   0.326    -.1366492     .405628
                          |
                    _cons |   3.056841   .1045295    29.24   0.000     2.848415    3.265267
            -------------------------------------------------------------------------------
            
            . estat hettest
            
            Breusch–Pagan/Cook–Weisberg test for heteroskedasticity 
            Assumption: Normal error terms
            Variable: Fitted values of CH1_taumean
            
            H0: Constant variance
            
                chi2(1) =   0.00
            Prob > chi2 = 0.9798
            
            .
            As per your data excerpt (that seems to lack of the -if- clause-related variable), the outcome of -estat hettest- does not show any evidence of heteroskedasticity.
            If that were not the case with your original sample, just invoke -robust-.
            I remind to myself first that, most of the time (that is, exception made for repeated anova) there's nothing that -anova- can do that -regress- cannot do better.
            Kind regards,
            Carlo
            (StataNow 18.5)

            Comment


            • #7
              My tuppenceworth is another graph. Here I use stripplot from SSC. The longer horizontal lines show means. The Tufte-style box plots just show medians, quartiles and extremes. There is no call for further detail in the box plots as the data are all plotted any way.

              The graph is clearly similar in general drift to the dotplot from Joseph Coveney in #5.

              Sometimes with a hint of heteroscedasticity, there is an obvious answer to consider working with logarithms, but not here.

              Despite the indications you'll get from some model fits, I would distrust the higher mean for Left without an independent oncological rationale.

              Code:
              stripplot CH, over(Tumor) tufte cumul cumprob vertical refline center boffset(-0.4) height(0.6) xla(, tlc(none))
              Click image for larger version

Name:	tumors.png
Views:	1
Size:	32.5 KB
ID:	1726067

              Comment


              • #8
                Thank you all...
                I will try your suggestions.
                This is just a piece of the dataset and analysis.

                Comment


                • #9
                  I ran across this thread and appreciated all the answers. Another option to consider, and as inspired by a Cross-Validated post, is to use bootstrapping to resample within the locations:
                  Code:
                  bootstrap, reps(1000) strata(loc): reg tau i.loc
                  This ends up producing standard errors that are quite similar to those that come from the approach proposed by Joseph Coveney that models the unequal variances directly.

                  Comment

                  Working...
                  X