Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Transform data or use a non-parametric analysis

    I have data from an experiment in which five different methods were used to abrade a ceramic and the resulting particle size in the atmosphere measured. Due to the anisotropic nature of the ceramic there is considerable scatter in the data which is right skewed, and robvar shows the data to fail the homogeneity of variance test.
    Click image for larger version

Name:	p1.png
Views:	4
Size:	57.3 KB
ID:	1534669

    I used a log transform
    Click image for larger version

Name:	p2.png
Views:	2
Size:	61.3 KB
ID:	1534671
    There is still inhomogeneity in the variance but the distribution of residuals following anova appears to be normal as assessed visually using distplot (from SSC)
    . robvar lc, by(method)

    | Summary of lc
    method | Mean Std. Dev. Freq.
    ------------+------------------------------------
    1 | -4.5900172 .18631618 900
    2 | -2.2221745 1.2543433 1,800
    3 | -2.2856141 .74755 1,800
    4 | -2.6684095 .90676613 900
    5 | -3.2487493 .35297332 900
    ------------+------------------------------------
    Total | -2.7889648 1.1870132 6,300


    W0 = 572.11260 df(4, 6295) Pr > F = 0

    W50 = 528.26727 df(4, 6295) Pr > F = 0

    W10 = 552.20502 df(4, 6295) Pr > F = 0


    My question is it acceptable to use this log transformed data; or would another transformation be preferable; or should I consider a nonparametric analysis such as dunntest (from SSC).

    Thank you.
    Eddy
    Attached Files

  • #2
    Well, I think that non-normality is probably not the issue you think it is. Your sample size is apparently nearly 7,000 observations. At that level, the central limit theorem kicks in, even for distributions as skew as your are showing, and, as a result, the standard errors, t-statistics, confidence limits, and p-values from linear regression are all OK. As for heteroscedasticity (non-homogeneity of variance) that remains a bit of a problem, but you can get around that using the -robust- option to get robust standard errors.

    Given that your predictor is just a categorical variable representing method of abrasion, I think it is a matter of taste what is best here. If you use the original concentration variable, you will be estimating arithmetic mean particle sizes and your effect estimates will represent differences between methods. If you use the logarithm, and then re-exponentiate the results, you will be estimating geometric mean particle sizes and the effect estimates will represents ratios between methods. Which of these is a more sensible way to think about this in the context of the real world and science. I don't see a statistical reason to prefer one over the other.

    If you use some other transform, you will be estimating God knows what and the effect estimates will probably be incomprehensible. One might resort to this if the simple use of the original variable or the log-transform were, for some reason, inadmissible. But as both of them seem fine, I wouldn't go here.

    Non-parametric analyses are also possible (Kruskal-Wallis and Kaplan-Meier analysis come to mind.) But again, the results are rather difficult to interpret in any meaningful way. So I would not be inclined towards these.

    Comment


    • #3
      Thank you for your advice. I admit that I am not used to dealing with such large sample sizes and I forgot about the implications of the central limit theorem. I will regress the original data with robust option - I am always vary about the meaning of a transformed variable.

      Eddy

      Comment


      • #4
        I would add that a reciprocal transformation is stronger than a logarithmic transformation and does tie in with summary by harmonic means. That said log particle size (if that is the response) is far easier (for me) to think about and (in Earth and environmental science literature I know about) is widely used and accepted.

        Comment


        • #5
          Thank you for your comments. I will have a look at the reciprocal transformation and also check the Earth Science literature.

          Eddy

          Comment

          Working...
          X