Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • update to - chaid - on SSC

    With many thanks to Kit Baum, the program - chaid - has been updated on SSC.

    - chaid - is a recursive partitioning, data mining, or decision tree methodology useful for exploratory data analysis and clustering observations.

    Several new features and extensions have been added to the Stata implementation of - chaid - including:

    1) A graphical depiction of the decision tree structure. -chaid - now uses Stata's graph twoway scatter to show the hierarchical partitioning structure as estimated by the CHAID algorithm.

    2) Exhaustive CHAID. Option to change the default multi-way partitioning CHAID algorithm with an binary split-only/exhaustive CHAID.

    3) Fit metric. A fit metric based on Cramer's V is implemented to discern the extent to which the CHAID decision tree fits the data.

    4) Importance. Extending from the fit metric, a permutation importance vector is imputed based on the decrements in fit owing to each splitting variable to assess their importance toward improving fit.

    5) Compatability with svyset data. Although it slows the CHAID algorithm significantly, complex survey data can now be "data mined" to uncover relationships that are consistent with complex design characteristics.

    6) Built-in xtile. An option to xtile continuous or ordered categorical with many category data. Such data are treated as ordered.

    7) Permuatation p-values. Implements p-values for merging and splitting based on permutation tests, but slows the CHAID decision tree algorithm. Useful, primarily, for small samples.

    To install chaid type:

    Code:
    ssc install chaid
    To update chaid type:

    Code:
    adoupdate chaid, update
    Please do not hesitate to contact me with suggestions, recommendations, or bug reports.

    - joe
    Joseph Nicholas Luchman, Ph.D., PStat® (American Statistical Association)
    ----
    Research Fellow
    Fors Marsh

    ----
    Version 18.0 MP

  • #2
    Thank you for your work on the CHAID ado. I am new to Stata so this may be user error, but I cannot figure out how to use the "missing" option in chaid. Below is the actual command (sans variable names):

    chaid myDV, ordered(blah blah blah) unordered(blah blah blah) xtile(blah blah blah, n(5)) minnode(4) minsplit(9) importance predicted missing

    When I attempt to run this command I receive:

    invalid syntax
    stata(): 3598 Stata returned error
    <istmt>: - function returned error
    r(3598);

    end of do-file

    Can you offer guidance?

    Comment


    • #3
      Well, this is at least partially a new user issue. It is not the "missing" option, alone, that is the problem. Seems missing works fine with or without "predict", but the error gets thrown when "importance" is included in the same command as "missing".

      Is this perhaps a bug?

      Comment


      • #4
        Hi Paul,

        There does indeed appear to be an errant comma that makes it's way into the syntax when missing and importance are invoked and there is a split on a variable with missing values.

        Many thanks for the report and a fix for this will be implemented as soon as I am able to update the .ado.

        - joe
        Joseph Nicholas Luchman, Ph.D., PStat® (American Statistical Association)
        ----
        Research Fellow
        Fors Marsh

        ----
        Version 18.0 MP

        Comment


        • #5
          With many thanks to Kit Baum, chaid has been updated on SSC.

          In particular, Version 2.1 has corrected the issue noted by Paul above as well as:

          a] use of the AIC (Akaike Information Criterion) to decide between splits with very small (i.e., effective 0) p-values

          b] moving the check of the response variable to ensure that it has <20 distinct values occurs after marking out missing values

          c] an error dealing with a missing colon/prefix when combining options svy with respalpha.

          d] an error that omitted the Bonferroni adjustment when no levels of the splitting variable were merged

          To update type:

          Code:
          ssc install chaid, replace
          or

          Code:
          adoupdate chaid, update
          As always, do not hesitate to contact me with bugs, suggestions, or comments with regard to chaid.

          - joe
          Joseph Nicholas Luchman, Ph.D., PStat® (American Statistical Association)
          ----
          Research Fellow
          Fors Marsh

          ----
          Version 18.0 MP

          Comment


          • #6
            I'm also running into a possible bug with the updated CHAID module. I'm running a fairly straightforward model using a sample of approximately 500:
            chaid a1ctarget if sample==1, unordered(metforx othernosulf sulfrx insulinrx grouphis bpshould knowbp) ordered(agecat dxdurcat comorbx pacat2 satfatcat sodicat) xtile(perfatav percarbsav persugav avfibermg perprotav) minnode(24) minsplit(47) exhaust noadj

            I let it run for 10 minutes but it doesn't resolve.
            However, if I drop the "noadj" option I get results in 20-30 seconds. Ditto if I replace noadj with a spltalpha option. So it seems to be an issue with the "noadj" command.

            Can you offer some assistance?
            Thanks much!

            Comment


            • #7
              Hi Kim,
              Many thanks for the report with respect to chaid.

              It was indeed a bug - but it didn't have to do with noadj directly (was just revealed by it).

              The issue was related to not stopping splitting when a splitting variable had only 1 level remaining - something which I erroneously thought would be caught by syntax earlier in the splitting process...

              I can also see another bug (which I just noticed - noadj actually invoked the Bonferroni adjustment instead of vice versa).

              Both issues will be fixed shortly and re-released on SSC.

              - joe
              Joseph Nicholas Luchman, Ph.D., PStat® (American Statistical Association)
              ----
              Research Fellow
              Fors Marsh

              ----
              Version 18.0 MP

              Comment


              • #8
                Again, with many thanks to Kit Baum, chaid has been updated on SSC.

                In particular, Version 2.2 has corrected the issue noted by Kim above (as well as the noadj issue I recently caught as well).

                To update type:

                Code:
                ssc install chaid, replace
                or

                Code:
                adoupdate chaid, update
                As always, do not hesitate to contact me with bugs, suggestions, or comments with regard to chaid.

                - joe
                Joseph Nicholas Luchman, Ph.D., PStat® (American Statistical Association)
                ----
                Research Fellow
                Fors Marsh

                ----
                Version 18.0 MP

                Comment

                Working...
                X