caret ml cv

Cross-validations (CV) are a possible cure for overfitting. Overfitting refers to the concept that the model is well built using the test-set, however when an unknown external validation set is applied the model completely fails. The best way to avoid overfitting is always to perform a three-way split

Training set (70% of data)
Test set (30% of data)
Validation set (+30% of new data)

Some people merge training an test set and just perform cross-validations on them and then use the validation set for "validation". Caret provides for different six different CV methods.

Num	Name	Speed	Accuracy	Name
1	boot632	fast	best	the .632+ Bootstrap
2	LGOCV	fast	good
3	LOOCV	slooow	good	leave-one-out cross-validation
4	cv	fast	good
5	repeatedcv	fast	good
6	boot	fast	ok
7	none	fastest	no	none

Speed, of course using no CV (method none) is the fastest, all others are good options and always should be enabled. be aware that depending on your settings method LOOCV maybe 40-fold slower than other methods.

Examples of all cross-validation methods

Links

Overfitting examples - some discussions about overfitting
caret CV examples - simple useful examples to perform caret CVs
04_Over_Fitting.R - chapter 4 from the caret book, beware example is rather large

caret-ML Home
caret-ML Overview
caret-ML Setups
caret-ML Data sets
caret-ML Preprocess
caret-ML Cross-validations
caret-ML Regression
caret-ML Classification
caret-ML Parallel
caret-ML Benchmarks
caret-ML Deployment
caret-ML Links and blogs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

caret ml cv

Clone this wiki locally