With many thanks to Kit Baum, the program - chaid - has been updated on SSC.
- chaid - is a recursive partitioning, data mining, or decision tree methodology useful for exploratory data analysis and clustering observations.
Several new features and extensions have been added to the Stata implementation of - chaid - including:
1) A graphical depiction of the decision tree structure. -chaid - now uses Stata's graph twoway scatter to show the hierarchical partitioning structure as estimated by the CHAID algorithm.
2) Exhaustive CHAID. Option to change the default multi-way partitioning CHAID algorithm with an binary split-only/exhaustive CHAID.
3) Fit metric. A fit metric based on Cramer's V is implemented to discern the extent to which the CHAID decision tree fits the data.
4) Importance. Extending from the fit metric, a permutation importance vector is imputed based on the decrements in fit owing to each splitting variable to assess their importance toward improving fit.
5) Compatability with svyset data. Although it slows the CHAID algorithm significantly, complex survey data can now be "data mined" to uncover relationships that are consistent with complex design characteristics.
6) Built-in xtile. An option to xtile continuous or ordered categorical with many category data. Such data are treated as ordered.
7) Permuatation p-values. Implements p-values for merging and splitting based on permutation tests, but slows the CHAID decision tree algorithm. Useful, primarily, for small samples.
To install chaid type:
To update chaid type:
Please do not hesitate to contact me with suggestions, recommendations, or bug reports.
- joe
- chaid - is a recursive partitioning, data mining, or decision tree methodology useful for exploratory data analysis and clustering observations.
Several new features and extensions have been added to the Stata implementation of - chaid - including:
1) A graphical depiction of the decision tree structure. -chaid - now uses Stata's graph twoway scatter to show the hierarchical partitioning structure as estimated by the CHAID algorithm.
2) Exhaustive CHAID. Option to change the default multi-way partitioning CHAID algorithm with an binary split-only/exhaustive CHAID.
3) Fit metric. A fit metric based on Cramer's V is implemented to discern the extent to which the CHAID decision tree fits the data.
4) Importance. Extending from the fit metric, a permutation importance vector is imputed based on the decrements in fit owing to each splitting variable to assess their importance toward improving fit.
5) Compatability with svyset data. Although it slows the CHAID algorithm significantly, complex survey data can now be "data mined" to uncover relationships that are consistent with complex design characteristics.
6) Built-in xtile. An option to xtile continuous or ordered categorical with many category data. Such data are treated as ordered.
7) Permuatation p-values. Implements p-values for merging and splitting based on permutation tests, but slows the CHAID decision tree algorithm. Useful, primarily, for small samples.
To install chaid type:
Code:
ssc install chaid
Code:
adoupdate chaid, update
- joe
Comment