Hi everybody:
Background. I'm working on a friend's dataset of 135 bettles from the north of Spain belonging, allegedly, to two different subspecies (european and african), characterised by a variable called E/R (ratio of total wing length to wing color patch length). I used Hartigan's dip test to reject unimodality on the complete dataset (succeeded; p=0.005). Then my friend, a p-value worshipper, asked me to verify if bimodality was also present in a subset of 43 cases, from a very restricted geographical area. Unfortunately, although violin plots suggest bimodality, dip test fails to be significant. I have been unable to convince him that absence of evidence is not evidence of absence (lack of significance doesn't imply proving unimodality).
So, I tried a different approach (showing that AIC is minimized in a mixture model compared to a single normal distribution). I used -fmm- (by Partha Deb, City University of New York) to fit a 2 components mixture for the overall dataset:
Now I'm stuck trying to fit a single normal distribution model to the same data, to compare the AIC.
Another question: is there some info I can use to compute a bimodality p-value for my friend, the p-value worshipper? I can test that the 2 means (3.37 and 4.83) are significantly different, also I can show that pi1 is significantly different from 0. Is that enough?
Thanks in advance
Marta GG
Background. I'm working on a friend's dataset of 135 bettles from the north of Spain belonging, allegedly, to two different subspecies (european and african), characterised by a variable called E/R (ratio of total wing length to wing color patch length). I used Hartigan's dip test to reject unimodality on the complete dataset (succeeded; p=0.005). Then my friend, a p-value worshipper, asked me to verify if bimodality was also present in a subset of 43 cases, from a very restricted geographical area. Unfortunately, although violin plots suggest bimodality, dip test fails to be significant. I have been unable to convince him that absence of evidence is not evidence of absence (lack of significance doesn't imply proving unimodality).
So, I tried a different approach (showing that AIC is minimized in a mixture model compared to a single normal distribution). I used -fmm- (by Partha Deb, City University of New York) to fit a 2 components mixture for the overall dataset:
Code:
. fmm e_r, mix(normal) comp(2) Fitting Normal regression model: Iteration 0: log likelihood = -168.39943 Iteration 1: log likelihood = -168.39757 Iteration 2: log likelihood = -168.39757 Fitting 2 component Normal model: Iteration 0: log likelihood = -168.71114 (not concave) Iteration 1: log likelihood = -168.41975 (not concave) Iteration 2: log likelihood = -168.39071 (not concave) Iteration 3: log likelihood = -168.39012 (not concave) Iteration 4: log likelihood = -168.38867 (not concave) Iteration 5: log likelihood = -168.38748 (not concave) Iteration 6: log likelihood = -168.38619 (not concave) Iteration 7: log likelihood = -168.38379 (not concave) Iteration 8: log likelihood = -168.3813 (not concave) Iteration 9: log likelihood = -168.19305 (not concave) Iteration 10: log likelihood = -163.63375 (not concave) Iteration 11: log likelihood = -158.61347 (not concave) Iteration 12: log likelihood = -151.78992 Iteration 13: log likelihood = -147.56396 Iteration 14: log likelihood = -146.9277 Iteration 15: log likelihood = -146.92397 Iteration 16: log likelihood = -146.92397 2 component Normal regression Number of obs = 135 Wald chi2(0) = . Log likelihood = -146.92397 Prob > chi2 = . ------------------------------------------------------------------------------ e_r | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- component1 | _cons | 3.372406 .0441419 76.40 0.000 3.285889 3.458922 -------------+---------------------------------------------------------------- component2 | _cons | 4.825329 .0813699 59.30 0.000 4.665847 4.984811 -------------+---------------------------------------------------------------- /imlogitpi1 | -.5489003 .2208433 -2.49 0.013 -.9817452 -.1160554 /lnsigma1 | -1.426499 .1336279 -10.68 0.000 -1.688404 -1.164593 /lnsigma2 | -.5804232 .1125089 -5.16 0.000 -.8009366 -.3599098 ------------------------------------------------------------------------------ sigma1 | .2401483 .0320905 .1848142 .3120497 sigma2 | .5596615 .0629669 .4489083 .6977392 pi1 | .3661196 .0512524 .2725456 .4710187 pi2 | .6338804 .0512524 .5289813 .7274544 ------------------------------------------------------------------------------ . estat ic ----------------------------------------------------------------------------- Model | Obs ll(null) ll(model) df AIC BIC -------------+--------------------------------------------------------------- . | 135 . -146.924 5 303.8479 318.3743 -----------------------------------------------------------------------------
Another question: is there some info I can use to compute a bimodality p-value for my friend, the p-value worshipper? I can test that the 2 means (3.37 and 4.83) are significantly different, also I can show that pi1 is significantly different from 0. Is that enough?
Thanks in advance
Marta GG
Comment