Concerning your syntax in #12:
I am guessing that you want to replicate what crossfold (from SSC) does. Here your code substantially revised (and using only 5 folds, not 10). I am only calculating the RMSE (biased) and RMSE (unbiased):
The list command (frame results) shows the RMSE for each fold as follows:
See whether the RMSE (biased, i.e. the SS divided by N) by this method is identical to the RMSE calculated by crossfold (from SSC) using the last fold:
- Please use code tags around your syntax (see FAQ 12.3). Read the complete FAQ, also 12.5 (you did post a Word document as an attachment -- I did not open it for obvious reasons).
- It seems that you did not understand my comments in #10 concerning the display command and the sum() function. Please read the help using
andCode:
help display
or better:Code:help sum()
.Code:help sum
- To understand the use of frames, see
(you probably should read the complete PDF manual entry).Code:
help frames
- You should indent commands enclosed by { } by some spaces to better see the structure of your program.
I am guessing that you want to replicate what crossfold (from SSC) does. Here your code substantially revised (and using only 5 folds, not 10). I am only calculating the RMSE (biased) and RMSE (unbiased):
Code:
cap frame change default
cap frame drop results
sysuse auto, clear
keep price mpg headroom
set seed 1234
gen rand = uniform()
egen split = cut(rand), group(5) // split data set into 5 folds assigned value from 0 to 4
fre split
frame create results fold n_t n rmse rmse_u
forvalues i = 0/4 {
* Fit the model using training set (split != `i')
qui reg price mpg headroom if split != `i'
local df_m = e(df_m)
local n_t = e(N)
* Calculate RMSE of unused group using coefficients of training set (split == `i')
qui predict res_2 if split == `i', residuals
qui replace res_2 = res_2^2 // square residuals
qui sum res_2, meanonly
local rmse = sqrt(r(mean))
local rmse_u = sqrt(r(sum)/(r(N) - `df_m' - 1))
* Save fold, n_t (n of training set), n (n of unused group), rmse and rmse_u (unbiased) in frame results:
frame post results (`i') (`n_t') (r(N)) (`rmse') (`rmse_u')
drop res_2 // drop squared residuals
}
frame results: list, noob
Code:
. frame results: list, noob +---------------------------------------+ | fold n_t n rmse rmse_u | |---------------------------------------| | 0 60 14 3142.947 3545.723 | | 1 59 15 2467.412 2758.651 | | 2 59 15 3746.17 4188.346 | | 3 59 15 1884.271 2106.679 | | 4 59 15 1863.891 2083.893 | +---------------------------------------+
Code:
. * Use -crossfold- (from SSC):
. set seed 1234
. crossfold reg price mpg headroom
| RMSE
-------------+----------
est1 | 3221.696
est2 | 2333.683
est3 | 3827.52
est4 | 1795.796
est5 | 1923.156
.
. * Calculate RMSE of last fold using method above (check if results are identical):
. predict res_2 if !e(sample), residuals
(60 missing values generated)
. replace res_2 = res_2^2
(14 real changes made)
. qui sum res_2
. di "RMSE of last fold: " sqrt(r(mean))
RMSE of last fold: 1923.1562

Comment