Hi everyone,
I am running a random forest with a categorical dependant variable. I need to compute (pseudo) R squared to compare model fit with a multinomial logit. It seems to me that the rforest Stata command does not directly output some type of a fit measure. The code I am using is this, which produces the output and graphs the variable importance.
Thanks for any help!
I am running a random forest with a categorical dependant variable. I need to compute (pseudo) R squared to compare model fit with a multinomial logit. It seems to me that the rforest Stata command does not directly output some type of a fit measure. The code I am using is this, which produces the output and graphs the variable importance.
Code:
*** Random Forest ***
rforest y x1 x2 x3 x4 x5 x6 x7 x8, type(class) iterations(2000)
*Output the statistics computed so far (note that the OOB error is computed at this stage)
ereturn list
*Compute expected values for variable weight
predict pred
*List the first five entries of variables
list y x1 x2 x3 x4 x5 x6 x7 x8 in 1/5
*Create a copy of the variable-importance matrix stored in e()
matrix importance = e(importance)
*Convert the matrix to a variable
svmat importance
*List the first five entries in the variable importance
list importance in 1/5
*Generate new variable id to be used for labeling
generate id=""
*Attach unique labels to individual columns in the chart
local mynames : rownames importance
local k : word count `mynames'
// If there are more variables than observations
if `k'>_N {
set obs `k'
}
forvalues i = 1(1)`k' {
local aword : word `i' of `mynames'
local alabel : variable label `aword'
if ("`alabel'"!="") quietly replace id= "`alabel'" in `i'
else quietly replace id= "`aword'" in `i'
}
*Graph the results
graph hbar (mean) importance, over(id, sort(1)) ytitle(Importance)
