You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am interested in undertaking permutation testing with the get_feature_importance( ) function, however, I have noticed that if the outcome/dependent variable in the test/ validation dataset is coded as Factors, the aforementioned function runs into an error. Sharing a reproducible code as an example:
library(caret)
library(randomForest)
library(datasets)
library(mikropml)
library(future.apply)
# Utilising the Iris dataset as an example
data<-iris
# Species is the outcome/dependent variable
data$Species <- as.factor(data$Species)
set.seed(222)
# Setting up train-test split and LOOCV as cross-validation method
ind <- sample(2, nrow(data), replace = TRUE, prob = c(0.7, 0.3))
train.data <- data[ind==1,]
test.data <- data[ind==2,]
train.control <- trainControl(method = "LOOCV", # LOOCV as CV method
savePredictions = TRUE,
classProbs = TRUE,
summaryFunction = multiClassSummary,
verboseIter = TRUE,
search = "grid")
# Random Forest Model with Outcome 'Species' as Factors
rf.outcome.factor <- train(Species~., data = train.data, method = 'rf', trControl = train.control)
# Get Feature Importance Permutation Testing
rf.feature.imp <- get_feature_importance(
rf.1,
test.data,
outcome_colname = 'Species',
perf_metric_function = multiClassSummary,
perf_metric_name = 'AUC',
class_probs = TRUE,
method = 'rf',
seed = 222
)
The error that shows up while running the above function is
Error in calc_perf_metrics(test_data, trained_model, outcome_colname, : subscript out of bounds
If I change the outcome/dependent 'Species' variable to character in the test subset, the same error doesn't happen.
# Random Forest with Outcome 'Species' as Character Variable in the test dataset
# Copying the dataframe
test.data.character <- test.data
# Converting Outcome into Character Vector
test.data.character$Species <- as.character(test.data.character$Species)
# Model Training
rf.outcome.character <- train(Species~., data = train.data, method = 'rf', trControl = train.control)
# Get Feature Importance Permutation Testing with Species Outcome as Character Vector in the test subset
rf.feature.imp.character <- get_feature_importance(
rf.outcome.character,
test.data.character,
outcome_colname = 'Species',
perf_metric_function = multiClassSummary,
perf_metric_name = 'AUC',
class_probs = TRUE,
method = 'rf',
seed = 222
)
print(rf.feature.imp.character)
Am I doing something wrong here, or is the code for get_feature_importance( ) only takes outcome as a character variable by design? Looking forward to the clarification!
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I am interested in undertaking permutation testing with the get_feature_importance( ) function, however, I have noticed that if the outcome/dependent variable in the test/ validation dataset is coded as Factors, the aforementioned function runs into an error. Sharing a reproducible code as an example:
The error that shows up while running the above function is
If I change the outcome/dependent 'Species' variable to character in the test subset, the same error doesn't happen.
Am I doing something wrong here, or is the code for get_feature_importance( ) only takes outcome as a character variable by design? Looking forward to the clarification!
Beta Was this translation helpful? Give feedback.
All reactions