Skip to content
Tobias Kind edited this page Nov 5, 2015 · 22 revisions

Cross-validations (CV) are a possible cure for overfitting. Overfitting refers to the concept that the model is well built using the test-set, however when an unknown external validation set is applied the model completely fails. The best way to avoid overfitting is always to perform a three-way split

  1. Training set (70% of data)
  2. Test set (30% of data)
  3. Validation set (+30% of new data)

Some people merge training an test set and just perform cross-validations on them and then use the validation set for "validation". Caret provides for different six different CV methods.

Num Name Speed Accuracy Name
1 boot632 fast best the .632+ Bootstrap
2 LGOCV fast good
3 LOOCV slooow good leave-one-out cross-validation
4 cv fast good
5 repeatedcv fast good
6 boot fast ok
7 none fastest no none

Speed, of course using no CV (method none) is the fastest, all others are good options and always should be enabled. be aware that depending on your settings method LOOCV maybe 40-fold slower than other methods.


Examples of all cross-validation methods


Links

Clone this wiki locally