ipred
mypredict.tree <- function(fit,newdata) {predict(fit, newdata, type="class")} errorest(Species ~ ., iris, model=tree, estimator="cv", predict=mypredict.tree) # Gives # 10-fold cross-validation estimator of misclassification error # # Misclassification error: 0.04
10-fold CV is the default. The default is to randomise the data before sampling. It is possible to use stratified sampling (what is this??). The predicted categories are not returned by default. Alternative options may be specified as follows:
# Use ?control.errorest for info on the parameters ans <- errorest(......., est.para=control.errorest(k=20, predictions=TRUE), ....) ans # 20-fold cross-validation estimator of misclassification error # # Misclassification error: 0.0533 table(ans$predictions, iris$Species) # setosa versicolor virginica # setosa 50 0 0 # versicolor 0 46 4 # virginica 0 4 46
A large variation was observed in the misclassification error from run to run (remember, it uses a random sampling), so I don't know whether it's a good idea to just quote a single number without knowing the variation. Leave-1-out CV can be simulated by specifying a value for k equal to n, the number of samples. Leave-n-out CV (n>1) is not possible using this package.