-
Notifications
You must be signed in to change notification settings - Fork 50
caret ml cv
Cross-validations (CV) are a possible cure for overfitting. Overfitting refers to the concept that the model is well built using the test-set, however when an unknown external validation set is applied the model completely fails. The best way to avoid overfitting is always to perform a three-way split
- Training set (70% of data)
- Test set (30% of data)
- Validation set (+30% of new data)
Some people merge training an test set and just perform cross-validations on them and then use the validation set for "validation". Caret provides for different six different CV methods.
Num | Name | Speed | Accuracy | Name |
---|---|---|---|---|
1 | boot632 | fast | best | the .632+ Bootstrap |
2 | LGOCV | fast | good | |
3 | LOOCV | slooow | good | leave-one-out cross-validation |
4 | cv | fast | good | |
5 | repeatedcv | fast | good | |
6 | boot | fast | ok | |
7 | none | fastest | no | none |
Speed, of course using no CV (method none) is the fastest, all others are good options and always should be enabled. be aware that depending on your settings method LOOCV maybe 40-fold slower than other methods.
Examples of all cross-validation methods
Links
-
Overfitting examples - some discussions about overfitting
-
caret CV examples - simple useful examples to perform caret CVs
-
04_Over_Fitting.R - chapter 4 from the caret book, beware example is rather large
- caret-ML Home
- caret-ML Overview
- caret-ML Setups
- caret-ML Data sets
- caret-ML Preprocess
- caret-ML Cross-validations
- caret-ML Regression
- caret-ML Classification
- caret-ML Parallel
- caret-ML Benchmarks
- caret-ML Deployment
- caret-ML Links and blogs