Skip to content
Tobias Kind edited this page Nov 5, 2015 · 49 revisions

The installation of caret seems quite simple. However you will notice that installing the most common packages and over 200 dependencies will take a while. For installing just a randomForest with caret or knn that seems fine, lazy loading will usually load them. However when code is hosted on other repositories such as BioConductor, R-Forge etc and requires code compilations and additional dependecies such as python and JAVA a full caret installation can become quite a hazzle.


Simple caret installation

# caret simple installation with most methods attached
install.packages("caret", dependencies = c("Imports", "Depends", "Suggests"))

Using the "simple caret installation" has two caveats, for example under WINDOWS when using the Microsoft CRAN mirror from REVO the caret version is usually behind the official CRAN mirror. Although the simplified version will install over 330 packages not all true dependencies will be covered. Many additional 200 packages such a Boruta will ask at run-time to be loaded and then the user has to enter 0 or 1. This can become quite annoying during long runs of all 200 methods or after package updates.


Comfort caret installation

# installs most of the 340 caret dependencies and
# caret book + seven commonly used but not all of them
mostP <- c("caret", "AppliedPredictiveModeling", "ggplot2", 
		"data.table", "plyr", "knitr", "shiny", "xts", "lattice")
install.packages(mostP, dependencies = c("Imports", "Depends", "Suggests"))
require(caret); sessionInfo();

The comfort mode installs the libraries from the caret book and the seven most commonly libraries. It is for those who just want to run a few rf and knn models but are not seriously interested in trying all 200 methods, ensembles, bags and


deLuxe caret installation

# deLuxe setup of caret package with allmost all required caret libraries 
# https://github.com/tobigithub/caret-machine-learning
# Tobias Kind (2015)
 
# installs most of the 340 caret dependencies + seven commonly used ones
mCom <- c("caret", "AppliedPredictiveModeling", "ggplot2", 
                "data.table", "plyr", "knitr", "shiny", "xts", "lattice")
install.packages(mCom, dependencies = c("Imports", "Depends", "Suggests"))     

# then load caret and check which additional libraries 
# covering over 200 models need to be installed
require(caret); sessionInfo();

# use carey getModelInfo() to obtain all related libraries
cLibs <- unique(unlist(lapply(getModelInfo(), function(x) x$library)))
install.packages(cLibs, dependencies = c("Imports", "Depends", "Suggests"))

# now load caret packages from BioConductor
# this is a static solution (not good) check with below URL for more info
# https://github.com/topepo/caret/blob/master/release_process/update_pkgs.R
source("https://bioconductor.org/biocLite.R")
biocLite()
biocLite(c("arm", "gpls", "logicFS", "vbmp"))

The deLuxe mode is the recommended mode for the caret installation. It will take care of over 400 packages installed


Testing if caret works

require(caret)
sessionInfo()

data(BloodBrain); set.seed(123)
fit1 <- train(bbbDescr, logBBB, "knn"); fit1

# k-Nearest Neighbors 
# 208 samples
# 134 predictors
# and more

Package caret use after installation

Please observe that the libraries are only installed once. There is no need to install the 400 dependent libraries again and again. If caret is used it will be called with:

library(caret)
#or
require(caret)

Additional material:

Source code:


Clone this wiki locally