Dear Statalist Users,
I want to learn how to build decision trees using the CHAID algorithm. To do that I looked at the most simple example in the help document of the procedure. As indicated, I run the following code:
set seed 1234567
webuse auto
chaid foreign, unordered(rep78) minnode(4) minsplit(10) xtile(length, n(3))
The result shown in the help is:
Chi-Square Automated Interaction Detection (CHAID) Tree Branching Results
--------------------------------------------------------------------------------
1 2 3 4
+---------------------------------------------------------+
1 | xtlength@1 xtlength@2 xtlength@2 xtlength@3 |
2 | rep78@1 3 2 rep78@4 5 |
3 | Cluster #1 Cluster #2 Cluster #4 Cluster #3 |
+---------------------------------------------------------+
The result I get is:
Chi-Square Automated Interaction Detection (CHAID) Tree Branching Results
--------------------------------------------------------------------------------
1 2 3 4
+---------------------------------------------------------+
1 | xtlength@1 xtlength@2 xtlength@2 xtlength@3 |
2 | rep78@1 4 5 rep78@2 3 |
3 | Cluster #1 Cluster #2 Cluster #4 Cluster #3 |
+---------------------------------------------------------+
Where the difference is in Cluster #2 and Cluster #4, with rep78@1 being merged with rep78@4 5 instead of rep78@2 3.
The first of the two results is the solution the procedure should return, since the contingency table (conditional on xtlength@2) is:
1 2 3 4 5
D 2 3 13 0 0
F 0 0 0 2 3
Where it can be seen that the algorithm should merge 1,2,3 and 4,5.
My question is: Why does the procedure no longer replicate the result from the help document? Is there a bug, or am I missing something?
Thank you very much for your help in advance!
Paulo
I want to learn how to build decision trees using the CHAID algorithm. To do that I looked at the most simple example in the help document of the procedure. As indicated, I run the following code:
set seed 1234567
webuse auto
chaid foreign, unordered(rep78) minnode(4) minsplit(10) xtile(length, n(3))
The result shown in the help is:
Chi-Square Automated Interaction Detection (CHAID) Tree Branching Results
--------------------------------------------------------------------------------
1 2 3 4
+---------------------------------------------------------+
1 | xtlength@1 xtlength@2 xtlength@2 xtlength@3 |
2 | rep78@1 3 2 rep78@4 5 |
3 | Cluster #1 Cluster #2 Cluster #4 Cluster #3 |
+---------------------------------------------------------+
The result I get is:
Chi-Square Automated Interaction Detection (CHAID) Tree Branching Results
--------------------------------------------------------------------------------
1 2 3 4
+---------------------------------------------------------+
1 | xtlength@1 xtlength@2 xtlength@2 xtlength@3 |
2 | rep78@1 4 5 rep78@2 3 |
3 | Cluster #1 Cluster #2 Cluster #4 Cluster #3 |
+---------------------------------------------------------+
Where the difference is in Cluster #2 and Cluster #4, with rep78@1 being merged with rep78@4 5 instead of rep78@2 3.
The first of the two results is the solution the procedure should return, since the contingency table (conditional on xtlength@2) is:
1 2 3 4 5
D 2 3 13 0 0
F 0 0 0 2 3
Where it can be seen that the algorithm should merge 1,2,3 and 4,5.
My question is: Why does the procedure no longer replicate the result from the help document? Is there a bug, or am I missing something?
Thank you very much for your help in advance!
Paulo
Comment