Skip to content
Tobias Kind edited this page Mar 28, 2017 · 49 revisions

The installation of caret seems quite simple. However you will notice that installing the most common packages and over 200 dependencies will take a while. For installing just a randomForest with caret or knn that seems fine, lazy loading will usually load them. However when code is hosted on other repositories such as BioConductor, R-Forge etc and requires code compilations and additional dependencies such as python and JAVA a full caret installation can become quite a hazzle. For beginner mode just use the simple installer with the R one liner. Machine learning experts use the deLuxe caret installer, based on popular demand, which will install almost all 765 libraries that are required by caret.


Simple caret installation

# caret simple installation with most methods attached
install.packages("caret", dependencies = c("Imports", "Depends", "Suggests"))

Using the "simple caret installation" has two caveats, for example under WINDOWS when using the Microsoft CRAN mirror from REVO the caret version is usually behind the official CRAN mirror. Although the simplified version will install over 330 packages not all true dependencies will be covered. Many additional 200 packages such a Boruta will ask at run-time to be loaded and then the user has to enter 0 or 1. This can become quite annoying during long runs of all 200 methods or after package updates.


Comfort caret installation

# installs most of the 340 caret dependencies and
# caret book + seven commonly used but not all of them
mostP <- c("caret", "AppliedPredictiveModeling", "ggplot2", 
		"data.table", "plyr", "knitr", "shiny", "xts", "lattice")
install.packages(mostP, dependencies = c("Imports", "Depends", "Suggests"))
require(caret); sessionInfo();

The comfort mode installs the libraries from the caret book and the seven most commonly libraries. It is for those who just want to run a few rf and knn models but are not seriously interested in trying all 200 methods, ensembles, bags and other methods.


deLuxe caret installation

# deLuxe setup of caret package with allmost all 765 required caret libraries 
# https://github.com/tobigithub/caret-machine-learning
# Tobias Kind (2015)

# 1) load caret packages from BioConductor, answer 'n' for updates
source("http://bioconductor.org/biocLite.R")
biocLite()
biocLite(c("arm", "gpls", "logicFS", "vbmp"))
 
# 2) installs most of the 340 caret dependencies + seven commonly used ones
mCom <- c("caret", "AppliedPredictiveModeling", "ggplot2", 
                "data.table", "plyr", "knitr", "shiny", "xts", "lattice")
install.packages(mCom, dependencies = c("Imports", "Depends", "Suggests"))     

# 3) load caret and check which additional libraries 
# covering over 200 models need to be installed
# use caret getModelInfo() to obtain all related libraries
require(caret); sessionInfo();
cLibs <- unique(unlist(lapply(getModelInfo(), function(x) x$library)))
detach("package:caret", unload=TRUE)
install.packages(cLibs, dependencies = c("Imports", "Depends", "Suggests"))

# 4) load packages from R-Forge
install.packages(c("CHAID"), repos="http://R-Forge.R-project.org")

# 5) Restart R, clean-up mess, and say 'y' when asked
# All packages that are not in CRAN such as SDDA need to be installed by hand
source("http://bioconductor.org/biocLite.R")
biocLite()
biocLite(c("gpls", "logicFS", "vbmp"))
### END

The deLuxe mode is the recommended mode for the caret installation. It will take care of over 765 packages installed, however in good old R manner, things can break at any given time. For example under Windows there is a DLL load limit, packages that contain bugs or erroneous code may break when loading a package. During install certain packages maybe overwritten or can not be overwritten so that additional may errors occur. The deLuxe mode is not for updating packages which requires checking installed and new packages.


Testing if caret works

library(caret)
sessionInfo()

data(BloodBrain); set.seed(123)
fit1 <- train(bbbDescr, logBBB, "knn"); fit1

# k-Nearest Neighbors 
# 208 samples
# 134 predictors
# and more

Package caret use after installation

Please observe that the libraries are only installed once. There is no need to install the 400 dependent libraries again and again. If caret is used it will be called with:

library(caret)
#or
require(caret)

How to get all caret models for regression?

This is a simple one liner, the advantage of using modelLookup()$forReg is that whenever caret is updated all the new models are added automatically. The disadvantage of course is that there is no information about speed, imminent and existing errors and memory requirements. So just passing modNames to an lapply/caret::train function will probably break or never finish due to multiple errors. Here the distinguised ML user has to do some due diligence and prepare and select the best models.

require(caret)
modNames <- unique(modelLookup()[modelLookup()$forReg,c(1)])
length(modNames); modNames;

The caret version 6-058 from late 2015 has around 119 regression models covered. That is quite impressive, not to forget that many of them can be automatically tuned and optimized or merged into ensembles to get better results.

c("ANFIS", "avNNet", "bag", "bagEarth", "bagEarthGCV", "bartMachine", 
"bayesglm", "bdk", "blackboost", "Boruta", "brnn", "BstLm", "bstSm", 
"bstTree", "cforest", "ctree", "ctree2", "cubist", "DENFIS", 
"dnn", "earth", "elm", "enet", "enpls", "enpls.fs", "evtree", 
"extraTrees", "FIR.DM", "foba", "FS.HGD", "gam", "gamboost", 
"gamLoess", "gamSpline", "gaussprLinear", "gaussprPoly", "gaussprRadial", 
"gbm", "gcvEarth", "GFS.FR.MOGUL", "GFS.LT.RS", "GFS.THRIFT", 
"glm", "glmboost", "glmnet", "glmStepAIC", "HYFIS", "icr", "kernelpls", 
"kknn", "knn", "krlsPoly", "krlsRadial", "lars", "lars2", "lasso", 
"leapBackward", "leapForward", "leapSeq", "lm", "lmStepAIC", 
"logicBag", "logreg", "M5", "M5Rules", "mlp", "mlpWeightDecay", 
"neuralnet", "nnet", "nnls", "nodeHarvest", "parRF", "partDSA", 
"pcaNNet", "pcr", "penalized", "pls", "plsRglm", "ppr", "pythonKnnReg", 
"qrf", "qrnn", "ranger", "rbf", "rbfDDA", "relaxo", "rf", "rfRules", 
"ridge", "rknn", "rknnBel", "rlm", "rpart", "rpart2", "rqlasso", 
"rqnc", "RRF", "RRFglobal", "rvmLinear", "rvmPoly", "rvmRadial", 
"SBC", "simpls", "spls", "superpc", "svmBoundrangeString", "svmExpoString", 
"svmLinear", "svmLinear2", "svmPoly", "svmRadial", "svmRadialCost", 
"svmSpectrumString", "treebag", "widekernelpls", "WM", "xgbLinear", 
"xgbTree", "xyf")

How to get all caret models for classification?

Same simple one liner, however there are models for binary and multi-class classifications, so here one more step of fine-tuning needs to be done.

require(caret)
modNames <- unique(modelLookup()[modelLookup()$forClass,c(1)])
length(modNames); modNames;

We can also observe that many models have dual use they cover regression and classification. Again models need to be benchmarked for speed, memory, accuracy, specificity and other performance metrics.

c("ada", "AdaBag", "AdaBoost.M1", "amdai", "avNNet", "awnb", 
"awtan", "bag", "bagEarth", "bagEarthGCV", "bagFDA", "bagFDAGCV", 
"bartMachine", "bayesglm", "bdk", "binda", "blackboost", "Boruta", 
"BstLm", "bstSm", "bstTree", "C5.0", "C5.0Cost", "C5.0Rules", 
"C5.0Tree", "cforest", "chaid", "CSimca", "ctree", "ctree2", 
"dnn", "dwdLinear", "dwdPoly", "dwdRadial", "earth", "elm", "evtree", 
"extraTrees", "fda", "FH.GBML", "FRBCS.CHI", "FRBCS.W", "gam", 
"gamboost", "gamLoess", "gamSpline", "gaussprLinear", "gaussprPoly", 
"gaussprRadial", "gbm", "gcvEarth", "GFS.GCCL", "glm", "glmboost", 
"glmnet", "glmStepAIC", "gpls", "hda", "hdda", "J48", "JRip", 
"kernelpls", "kknn", "knn", "lda", "lda2", "Linda", "LMT", "loclda", 
"logicBag", "LogitBoost", "logreg", "lssvmLinear", "lssvmPoly", 
"lssvmRadial", "lvq", "mda", "Mlda", "mlp", "mlpWeightDecay", 
"multinom", "nb", "nbDiscrete", "nbSearch", "nnet", "nodeHarvest", 
"oblique.tree", "OneR", "ORFlog", "ORFpls", "ORFridge", "ORFsvm", 
"ownn", "pam", "parRF", "PART", "partDSA", "pcaNNet", "pda", 
"pda2", "PenalizedLDA", "plr", "pls", "plsRglm", "polr", "protoclass", 
"qda", "QdaCov", "ranger", "rbf", "rbfDDA", "rda", "rf", "rFerns", 
"RFlda", "rfRules", "rknn", "rknnBel", "rmda", "rocc", "rotationForest", 
"rotationForestCp", "rpart", "rpart2", "rpartCost", "RRF", "RRFglobal", 
"rrlda", "RSimca", "sda", "sddaLDA", "sddaQDA", "sdwd", "simpls", 
"SLAVE", "slda", "smda", "snn", "sparseLDA", "spls", "stepLDA", 
"stepQDA", "svmBoundrangeString", "svmExpoString", "svmLinear", 
"svmLinear2", "svmPoly", "svmRadial", "svmRadialCost", "svmRadialWeights", 
"svmSpectrumString", "tan", "tanSearch", "treebag", "vbmpRadial", 
"widekernelpls", "wsrf", "xgbLinear", "xgbTree", "xyf")

Output all caret models into web browser via DT

Now this is a nifty function because DT is really much better than the archaic R-GUI, of course we can also use R-Studio, but the below examples puts all caret models and parameters to a browsable table and we can sort and copy around. Make sure to install "DT" via install.packages("DT")

all-caret-models-regression-classification-dt

require(caret)
# install.packages("DT")
require(DT)
modelLookup()

#length of models in function
MAX = dim(modelLookup())[1];
#perform model Lookup
caretModels <- modelLookup()
#coerce into dataframe for web output
caretModels <- as.data.frame(caretModels)
class(caretModels)

# call web output with correct column names
datatable(caretModels,  options = list(
 		columnDefs = list(list(className = 'dt-left', targets = c(0,1,2,3,4,5,6))),
 		pageLength = MAX,
   		order = list(list(0, 'asc'))),
 		colnames = c('Num','model',' parameter', 'label', 'forReg', 'forClass',' probModel'),
 	        caption = paste('Caret models for regression and classification',Sys.time()),
 	        class = 'cell-border stripe')  %>% 	       
 	            formatStyle(2,
 		    background = styleColorBar(1, 'steelblue'),
 		    backgroundSize = '100% 90%',
 		    backgroundRepeat = 'no-repeat',
 		    backgroundPosition = 'center'
)
 

Additional material:

Source code:


Clone this wiki locally