-
Notifications
You must be signed in to change notification settings - Fork 50
caret ml setup
The installation of caret seems quite simple. However you will notice that installing the most common packages and over 200 dependencies will take a while. For installing just a randomForest with caret or knn that seems fine, lazy loading will usually load them. However when code is hosted on other repositories such as BioConductor, R-Forge etc and requires code compilations and additional dependencies such as python and JAVA a full caret installation can become quite a hazzle. For beginner mode just use the simple installer with the R one liner. Machine learning experts use the deLuxe caret installer, based on popular demand, which will install almost all 765 libraries that are required by caret.
Simple caret installation
# caret simple installation with most methods attached
install.packages("caret", dependencies = c("Imports", "Depends", "Suggests"))
Using the "simple caret installation" has two caveats, for example under WINDOWS when using the Microsoft CRAN mirror from REVO the caret version is usually behind the official CRAN mirror. Although the simplified version will install over 330 packages not all true dependencies will be covered. Many additional 200 packages such a Boruta will ask at run-time to be loaded and then the user has to enter 0 or 1. This can become quite annoying during long runs of all 200 methods or after package updates.
Comfort caret installation
# installs most of the 340 caret dependencies and
# caret book + seven commonly used but not all of them
mostP <- c("caret", "AppliedPredictiveModeling", "ggplot2",
"data.table", "plyr", "knitr", "shiny", "xts", "lattice")
install.packages(mostP, dependencies = c("Imports", "Depends", "Suggests"))
require(caret); sessionInfo();
The comfort mode installs the libraries from the caret book and the seven most commonly libraries. It is for those who just want to run a few rf and knn models but are not seriously interested in trying all 200 methods, ensembles, bags and other methods.
deLuxe caret installation
# deLuxe setup of caret package with allmost all 765 required caret libraries
# https://github.com/tobigithub/caret-machine-learning
# Tobias Kind (2015)
# 1) load caret packages from BioConductor, answer 'n' for updates
source("http://bioconductor.org/biocLite.R")
biocLite()
biocLite(c("arm", "gpls", "logicFS", "vbmp"))
# 2) installs most of the 340 caret dependencies + seven commonly used ones
mCom <- c("caret", "AppliedPredictiveModeling", "ggplot2",
"data.table", "plyr", "knitr", "shiny", "xts", "lattice")
install.packages(mCom, dependencies = c("Imports", "Depends", "Suggests"))
# 3) load caret and check which additional libraries
# covering over 200 models need to be installed
# use caret getModelInfo() to obtain all related libraries
require(caret); sessionInfo();
cLibs <- unique(unlist(lapply(getModelInfo(), function(x) x$library)))
detach("package:caret", unload=TRUE)
install.packages(cLibs, dependencies = c("Imports", "Depends", "Suggests"))
# 4) load packages from R-Forge
install.packages(c("CHAID"), repos="http://R-Forge.R-project.org")
# 5) Restart R, clean-up mess, and say 'y' when asked
# All packages that are not in CRAN such as SDDA need to be installed by hand
source("http://bioconductor.org/biocLite.R")
biocLite()
biocLite(c("gpls", "logicFS", "vbmp"))
### END
The deLuxe mode is the recommended mode for the caret installation. It will take care of over 765 packages installed, however in good old R manner, things can break at any given time. For example under Windows there is a DLL load limit, packages that contain bugs or erroneous code may break when loading a package. During install certain packages maybe overwritten or can not be overwritten so that additional may errors occur. The deLuxe mode is not for updating packages which requires checking installed and new packages.
Testing if caret works
library(caret)
sessionInfo()
data(BloodBrain); set.seed(123)
fit1 <- train(bbbDescr, logBBB, "knn"); fit1
# k-Nearest Neighbors
# 208 samples
# 134 predictors
# and more
Package caret use after installation
Please observe that the libraries are only installed once. There is no need to install the 400 dependent libraries again and again. If caret is used it will be called with:
library(caret)
#or
require(caret)
Additional material:
- [caret on CRAN] (https://cran.r-project.org/web/packages/caret/index.html) - download latest binaries of caret
- [R dependencies] (http://blog.revolutionanalytics.com/2014/07/dependencies-of-popular-r-packages.html) - dependencies of popular packages
Source code:
- [caret-setup-examples] (https://github.com/tobigithub/caret-machine-learning/tree/master/caret-setup)
- [caret updates] (https://github.com/topepo/caret/blob/master/release_process/update_pkgs.R)
- [miniCRAN] (https://github.com/RevolutionAnalytics/miniCRAN)
- caret-ML Home
- caret-ML Overview
- caret-ML Setups
- caret-ML Data sets
- caret-ML Preprocess
- caret-ML Cross-validations
- caret-ML Regression
- caret-ML Classification
- caret-ML Parallel
- caret-ML Benchmarks
- caret-ML Deployment
- caret-ML Links and blogs