I am working on a data science/machine learning project and was happy to see Stata packages on machine learning topics:
https://www.stata.com/stata-news/news33-4/users-corner/
but I am not sure how effective these packages are in comparison with R or Python algorithms.
1. If you know any of these Stata packages work well in your case, please let me know and if you can share your example/project outcome, that would be fantastic:
Or if you know in which situation, it does not work, please share as well.
I have tried decision trees using R and it worked pretty well both statistically and graphically. In Stata, I tried -rforest- and -chaidforest- as follows:
a. Which one is really a package for decision trees? The names of those packages are both random forest but it seems to me that -chaid- is more on decision trees because it provides nodes of decision and graphs on decision tree.
b. Can rforest provide a graph on decision trees as well? I only see the graph on variable importance
c. -chaidforest - only allows one variable to start the splitting with, i.e. "foreign". If user does not know which "important" variable to start with, what would be an efficient way to start the decision tree? How do we allow the whole set of variables in and the algorithm automatically make the decision?
Hope to hear from your experience! And thank you in advance for your answers.
Wish Stata can provide great packages and quick solution for machine learning like it always does for other areas.
Best regards,
https://www.stata.com/stata-news/news33-4/users-corner/
but I am not sure how effective these packages are in comparison with R or Python algorithms.
1. If you know any of these Stata packages work well in your case, please let me know and if you can share your example/project outcome, that would be fantastic:
Or if you know in which situation, it does not work, please share as well.
- Support Vector Machines (SVM)
- Random forest
- Neural Network
- Decision Trees
- Bayesian Approach
- Naive Bayes
- kNN
- Deep Learning
- AI
I have tried decision trees using R and it worked pretty well both statistically and graphically. In Stata, I tried -rforest- and -chaidforest- as follows:
Code:
clear all webuse auto chaidforest foreign, unordered(rep78) minnode(2) minsplit(5) xtile(length weight, nquantiles(3)) alpha(.8) estat gettree, tree(1) graph rforest foreign weight length rep78 mpg, type(reg) iter(500) rforest foreign weight length rep78 mpg, type(class)
b. Can rforest provide a graph on decision trees as well? I only see the graph on variable importance
c. -chaidforest - only allows one variable to start the splitting with, i.e. "foreign". If user does not know which "important" variable to start with, what would be an efficient way to start the decision tree? How do we allow the whole set of variables in and the algorithm automatically make the decision?
Hope to hear from your experience! And thank you in advance for your answers.
Wish Stata can provide great packages and quick solution for machine learning like it always does for other areas.
Best regards,
Comment