method qrf train results are slightly random in caret with seed set #27

tobigithub · 2015-12-30T06:27:29Z

Method qrf in caret gives slightly random results (noise) with the same seed set. Expected would be exactly the same result for multiple runs.

# load caret and DT the cars data set
require(caret); require(DT);  require(mlbench);
library(AppliedPredictiveModeling)
data(solubility)

# load the data and coerce into single frame (legacy)
training_data = data.frame(solTrainX,solTrainY)[1:20,]
testing_data = data.frame(solTestX,solTestY)

# just rename columsn to stay conform with style below
colnames(training_data)[colnames(training_data) == 'solTrainY'] <- 'y'
colnames(testing_data)[colnames(testing_data) == 'solTestY'] <- 'y'


# all the training data (just named x and y)
y <- training_data$y
x <- training_data[, -ncol(training_data)]

# load all libraries
library(doParallel); cl <- makeCluster(8); registerDoParallel(cl)

# RMSE and R2 results should be the same, three times
set.seed(123); result <- train(x,y,"qrf"); getTrainPerf(result)
set.seed(123); result <- train(x,y,"qrf"); getTrainPerf(result)
set.seed(123); result <- train(x,y,"qrf"); getTrainPerf(result)

# stop the parallel processing and register sequential front-end
stopCluster(cl); registerDoSEQ();

Random results (noise?):

> # RMSE and R2 results should be the same, three times
> set.seed(123); result <- train(x,y,"qrf"); getTrainPerf(result)
   TrainRMSE TrainRsquared method
1 0.07520421     0.2510523    qrf
> set.seed(123); result <- train(x,y,"qrf"); getTrainPerf(result)
   TrainRMSE TrainRsquared method
1 0.07571808     0.2133274    qrf
> set.seed(123); result <- train(x,y,"qrf"); getTrainPerf(result)
   TrainRMSE TrainRsquared method
1 0.07585281       0.23388    qrf

Expected results as example with "knn"

> set.seed(123); result <- train(x,y,"knn"); getTrainPerf(result)
   TrainRMSE TrainRsquared method
1 0.07321177     0.1691287    knn
> set.seed(123); result <- train(x,y,"knn"); getTrainPerf(result)
   TrainRMSE TrainRsquared method
1 0.07321177     0.1691287    knn
> set.seed(123); result <- train(x,y,"knn"); getTrainPerf(result)
   TrainRMSE TrainRsquared method
1 0.07321177     0.1691287    knn

This may be a result of the "randomness" in the forest. Nomen est omen. Maybe a feature not a bug.

tobigithub · 2015-12-30T06:54:04Z

One easy way to run fully reproducible model in parallel mode using the caret package is by using the seeds argument when calling the train control.

see also
http://stackoverflow.com/questions/13403427/fully-reproducible-parallel-models-using-caret

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

method qrf train results are slightly random in caret with seed set #27

method qrf train results are slightly random in caret with seed set #27

tobigithub commented Dec 30, 2015

tobigithub commented Dec 30, 2015

method qrf train results are slightly random in caret with seed set #27

method qrf train results are slightly random in caret with seed set #27

Comments

tobigithub commented Dec 30, 2015

tobigithub commented Dec 30, 2015