Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Random Forest R squared

    Hi everyone,
    I am running a random forest with a categorical dependant variable. I need to compute (pseudo) R squared to compare model fit with a multinomial logit. It seems to me that the rforest Stata command does not directly output some type of a fit measure. The code I am using is this, which produces the output and graphs the variable importance.

    Code:
    *** Random Forest ***
    
    rforest y x1 x2 x3 x4 x5 x6 x7 x8, type(class) iterations(2000)
     
    *Output the statistics computed so far (note that the OOB error is computed at this stage)
    ereturn list
    
    *Compute expected values for variable weight
    predict pred
    
    *List the first five entries of variables
    list y x1 x2 x3 x4 x5 x6 x7 x8 in 1/5
    
    *Create a copy of the variable-importance matrix stored in e()
    matrix importance = e(importance)
    
    *Convert the matrix to a variable
    svmat importance
    
    *List the first five entries in the variable importance
    list importance in 1/5
    
    *Generate new variable id to be used for labeling
    generate id=""
    
    *Attach unique labels to individual columns in the chart
            local mynames : rownames importance
            local k : word count `mynames'
                // If there are more variables than observations
                if `k'>_N {
                    set obs `k'
                }
                forvalues i = 1(1)`k' {
                    local aword : word `i' of `mynames'
                    local alabel : variable label `aword'
                    if ("`alabel'"!="") quietly replace id= "`alabel'" in `i'
                    else quietly replace id= "`aword'" in `i'
                }
    
    *Graph the results
    graph hbar (mean) importance, over(id, sort(1)) ytitle(Importance)
    Thanks for any help!
Working...
X