caret ml cv

Cross-validations (CV) are a possible cure for overfitting. Overfitting refers to the concept that the model is well built using the test-set, however when an unknown external validation set is applied the model completely fails. The best way to avoid overfitting is always to perform a three-way split

Training set (70% of data)
Test set (30% of data)
Validation set (+30% of new data)

Some people merge training an test set and just perform cross-validations on them and then use the validation set for "validation". Caret provides for different six different CV methods. Explanations can be found here.

Num	Name	Speed	Accuracy	Name
1	boot632	fast	best	the .632+ Bootstrap
2	LGOCV	fast	good	leave-group-out cross-validation
3	LOOCV	slooow	good	leave-one-out cross-validation
4	cv	fast	good	k-fold cross-validation
5	repeatedcv	fast	good	repeated 10–fold cross–validation
6	boot	fast	ok	bootstrap
7	none	fastest	no	none

Speed, of course using no CV (method none) is the fastest, all others are good options and always should be enabled. be aware that depending on your settings method LOOCV maybe 40-fold slower than other methods.

Simple example caret cross-validation methods

# Single example, no cross-validation
  require(caret); data(BloodBrain); set.seed(123);
  fit1 <- train(bbbDescr, logBBB, "knn"); fit1

# cross-validation example with method boot 
  require(caret); data(BloodBrain); set.seed(123);
  tc <- trainControl(method="boot")
  fit1 <- train(bbbDescr, logBBB, trControl=tc, method="knn");  fit1

All six CV-methods in caret

Now it maybe interesting to see which CV method performs best or to use benchmarks. One can do that sequentially one-by-one which is easier to understand or in a loop which is more human readable. In R we can also use lapply which returns a list or sapply which returns a matrix. Because the results are rather complicated I prefer loops or sequential code.

# All available six cross-validation methods applied  
  require(caret); data(BloodBrain); 
  cvMethods <- c("boot632","LGOCV","LOOCV","cv","repeatedcv", "boot" );
  all <- sapply(cvMethods ,function(x) {set.seed(123); print(x); tc <- trainControl(method=(x))
                    fit1 <- train(bbbDescr, logBBB, trControl=tc, method="knn") }); all 
  all[4, ]

# All caret cross-validation methods applied using lapply (list result) 
  require(caret); data(BloodBrain); 
  cvMethods <- c("boot632","LGOCV","LOOCV","cv","repeatedcv", "boot");
  all <- lapply(cvMethods ,function(x) {set.seed(123); print(x); tc <- trainControl(method=(x))
                    fit1 <- train(bbbDescr, logBBB, trControl=tc, method="knn") })  
  all

The latest lapply and sapply method examples gives us a nice view of the different cross-validations at once. So we can see which method performs best, extract the times for performing CVs and do more. Of course such complicated matrices are hard to handle, because they are multi-dimensional. Here assigning single model names and looping trough them may be easier.

 # extract the used cvMethods (redundant because already incvMethods) 
  myNames <- lapply(1:6, function(x) all[[x]]$control$method)
  # save results
  results <- sapply(all,getTrainPerf)
  # change column Names to cv methods
   colnames(results) <- myNames; 
  # get the results
  results
 
#               boot632   LGOCV     LOOCV     cv        repeatedcv boot     
# TrainRMSE     0.619778  0.6275048 0.6309407 0.6192086 0.6192086  0.66943  
# TrainRsquared 0.4009745 0.3554037 0.3429081 0.3831812 0.3831812  0.3140373
# method        "knn"     "knn"     "knn"     "knn"     "knn"      "knn"    
# "none" can not be included in lapply/sapply
none
  k  RMSE       Rsquared   RMSE SD     Rsquared SD
  5  0.7080781  0.2617369  0.06097704  0.07784421 
  7  0.6934844  0.2785458  0.06193955  0.08353033 
  9  0.6819015  0.2956399  0.05684083  0.07999218

Links

Overfitting examples - some discussions about overfitting
caret CV examples - simple useful examples to perform caret CVs, each method is explained
04_Over_Fitting.R - chapter 4 from the caret book, beware example is rather large

Source code

caret-cv-examples - examples from the page above

caret-ML Home
caret-ML Overview
caret-ML Setups
caret-ML Data sets
caret-ML Preprocess
caret-ML Cross-validations
caret-ML Regression
caret-ML Classification
caret-ML Parallel
caret-ML Benchmarks
caret-ML Deployment
caret-ML Links and blogs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

caret ml cv

Clone this wiki locally