Hi everyone,
I am running a random forest with a categorical dependant variable. I need to compute (pseudo) R squared to compare model fit with a multinomial logit. It seems to me that the rforest Stata command does not directly output some type of a fit measure. The code I am using is this, which produces the output and graphs the variable importance.
Thanks for any help!
I am running a random forest with a categorical dependant variable. I need to compute (pseudo) R squared to compare model fit with a multinomial logit. It seems to me that the rforest Stata command does not directly output some type of a fit measure. The code I am using is this, which produces the output and graphs the variable importance.
Code:
*** Random Forest *** rforest y x1 x2 x3 x4 x5 x6 x7 x8, type(class) iterations(2000) *Output the statistics computed so far (note that the OOB error is computed at this stage) ereturn list *Compute expected values for variable weight predict pred *List the first five entries of variables list y x1 x2 x3 x4 x5 x6 x7 x8 in 1/5 *Create a copy of the variable-importance matrix stored in e() matrix importance = e(importance) *Convert the matrix to a variable svmat importance *List the first five entries in the variable importance list importance in 1/5 *Generate new variable id to be used for labeling generate id="" *Attach unique labels to individual columns in the chart local mynames : rownames importance local k : word count `mynames' // If there are more variables than observations if `k'>_N { set obs `k' } forvalues i = 1(1)`k' { local aword : word `i' of `mynames' local alabel : variable label `aword' if ("`alabel'"!="") quietly replace id= "`alabel'" in `i' else quietly replace id= "`aword'" in `i' } *Graph the results graph hbar (mean) importance, over(id, sort(1)) ytitle(Importance)