update to - chaid - on SSC

Joseph Luchman

Join Date: Mar 2014

Posts: 114
#1

update to - chaid - on SSC

01 Sep 2014, 09:54

With many thanks to Kit Baum, the program - chaid - has been updated on SSC.

- chaid - is a recursive partitioning, data mining, or decision tree methodology useful for exploratory data analysis and clustering observations.

Several new features and extensions have been added to the Stata implementation of - chaid - including:

1) A graphical depiction of the decision tree structure. -chaid - now uses Stata's graph twoway scatter to show the hierarchical partitioning structure as estimated by the CHAID algorithm.

2) Exhaustive CHAID. Option to change the default multi-way partitioning CHAID algorithm with an binary split-only/exhaustive CHAID.

3) Fit metric. A fit metric based on Cramer's V is implemented to discern the extent to which the CHAID decision tree fits the data.

4) Importance. Extending from the fit metric, a permutation importance vector is imputed based on the decrements in fit owing to each splitting variable to assess their importance toward improving fit.

5) Compatability with svyset data. Although it slows the CHAID algorithm significantly, complex survey data can now be "data mined" to uncover relationships that are consistent with complex design characteristics.

6) Built-in xtile. An option to xtile continuous or ordered categorical with many category data. Such data are treated as ordered.

7) Permuatation p-values. Implements p-values for merging and splitting based on permutation tests, but slows the CHAID decision tree algorithm. Useful, primarily, for small samples.

To install chaid type:

Code:

ssc install chaid

To update chaid type:

Code:

adoupdate chaid, update

Please do not hesitate to contact me with suggestions, recommendations, or bug reports.

- joe

Joseph Nicholas Luchman, Ph.D., PStat® (American Statistical Association)
----
Research Fellow
Fors Marsh
----
Version 18.0 MP
Tags: None

2 likes
Paul Bergmann

Join Date: Nov 2014

Posts: 6
#2

22 Nov 2014, 11:11

Thank you for your work on the CHAID ado. I am new to Stata so this may be user error, but I cannot figure out how to use the "missing" option in chaid. Below is the actual command (sans variable names):

chaid myDV, ordered(blah blah blah) unordered(blah blah blah) xtile(blah blah blah, n(5)) minnode(4) minsplit(9) importance predicted missing

When I attempt to run this command I receive:

invalid syntax
stata(): 3598 Stata returned error
<istmt>: - function returned error
r(3598);

end of do-file

Can you offer guidance?
Comment
Paul Bergmann

Join Date: Nov 2014

Posts: 6
#3

22 Nov 2014, 11:26

Well, this is at least partially a new user issue. It is not the "missing" option, alone, that is the problem. Seems missing works fine with or without "predict", but the error gets thrown when "importance" is included in the same command as "missing".

Is this perhaps a bug?
Comment
Joseph Luchman

Join Date: Mar 2014

Posts: 114
#4

24 Nov 2014, 08:33

Hi Paul,

There does indeed appear to be an errant comma that makes it's way into the syntax when missing and importance are invoked and there is a split on a variable with missing values.

Many thanks for the report and a fix for this will be implemented as soon as I am able to update the .ado.

- joe

Joseph Nicholas Luchman, Ph.D., PStat® (American Statistical Association)
----
Research Fellow
Fors Marsh
----
Version 18.0 MP
Comment
Joseph Luchman

Join Date: Mar 2014

Posts: 114
#5

22 Dec 2014, 09:26

With many thanks to Kit Baum, chaid has been updated on SSC.

In particular, Version 2.1 has corrected the issue noted by Paul above as well as:

a] use of the AIC (Akaike Information Criterion) to decide between splits with very small (i.e., effective 0) p-values

b] moving the check of the response variable to ensure that it has <20 distinct values occurs after marking out missing values

c] an error dealing with a missing colon/prefix when combining options svy with respalpha.

d] an error that omitted the Bonferroni adjustment when no levels of the splitting variable were merged

To update type:

Code:

ssc install chaid, replace

or

Code:

adoupdate chaid, update

As always, do not hesitate to contact me with bugs, suggestions, or comments with regard to chaid.

- joe

Joseph Nicholas Luchman, Ph.D., PStat® (American Statistical Association)
----
Research Fellow
Fors Marsh
----
Version 18.0 MP
Comment
Kim Wilson

Join Date: Feb 2015

Posts: 1
#6

12 Feb 2015, 07:02

I'm also running into a possible bug with the updated CHAID module. I'm running a fairly straightforward model using a sample of approximately 500:
chaid a1ctarget if sample==1, unordered(metforx othernosulf sulfrx insulinrx grouphis bpshould knowbp) ordered(agecat dxdurcat comorbx pacat2 satfatcat sodicat) xtile(perfatav percarbsav persugav avfibermg perprotav) minnode(24) minsplit(47) exhaust noadj

I let it run for 10 minutes but it doesn't resolve.
However, if I drop the "noadj" option I get results in 20-30 seconds. Ditto if I replace noadj with a spltalpha option. So it seems to be an issue with the "noadj" command.

Can you offer some assistance?
Thanks much!
Comment
Joseph Luchman

Join Date: Mar 2014

Posts: 114
#7

12 Feb 2015, 10:48

Hi Kim,
Many thanks for the report with respect to chaid.

It was indeed a bug - but it didn't have to do with noadj directly (was just revealed by it).

The issue was related to not stopping splitting when a splitting variable had only 1 level remaining - something which I erroneously thought would be caught by syntax earlier in the splitting process...

I can also see another bug (which I just noticed - noadj actually invoked the Bonferroni adjustment instead of vice versa).

Both issues will be fixed shortly and re-released on SSC.

- joe

Joseph Nicholas Luchman, Ph.D., PStat® (American Statistical Association)
----
Research Fellow
Fors Marsh
----
Version 18.0 MP
Comment
Joseph Luchman

Join Date: Mar 2014

Posts: 114
#8

17 Feb 2015, 08:15

Again, with many thanks to Kit Baum, chaid has been updated on SSC.

In particular, Version 2.2 has corrected the issue noted by Kim above (as well as the noadj issue I recently caught as well).

To update type:

Code:

ssc install chaid, replace

or

Code:

adoupdate chaid, update

As always, do not hesitate to contact me with bugs, suggestions, or comments with regard to chaid.

- joe

Joseph Nicholas Luchman, Ph.D., PStat® (American Statistical Association)
----
Research Fellow
Fors Marsh
----
Version 18.0 MP
Comment

Announcement

update to - chaid - on SSC

Comment

Comment

Comment

Comment

Comment

Comment

Comment