Transforming the Factor Outcome Variable as Characters #344

RishBez · 2024-07-16T13:08:26Z

RishBez
Jul 16, 2024

I am interested in undertaking permutation testing with the get_feature_importance( ) function, however, I have noticed that if the outcome/dependent variable in the test/ validation dataset is coded as Factors, the aforementioned function runs into an error. Sharing a reproducible code as an example:

library(caret)
library(randomForest)
library(datasets)
library(mikropml)
library(future.apply)

# Utilising the Iris dataset as an example
data<-iris
# Species is the outcome/dependent variable
data$Species <- as.factor(data$Species)

set.seed(222)
# Setting up train-test split and LOOCV as cross-validation method
ind <- sample(2, nrow(data), replace = TRUE, prob = c(0.7, 0.3))
train.data <- data[ind==1,]
test.data <- data[ind==2,]
train.control <- trainControl(method = "LOOCV", # LOOCV as CV method
                                savePredictions = TRUE,
                                classProbs = TRUE,
                                summaryFunction = multiClassSummary,
                                verboseIter = TRUE,
                                search = "grid")
# Random Forest Model with Outcome 'Species' as Factors

rf.outcome.factor <- train(Species~., data = train.data, method = 'rf', trControl = train.control)

# Get Feature Importance Permutation Testing
rf.feature.imp <- get_feature_importance(
  rf.1,
  test.data,
  outcome_colname = 'Species', 
  perf_metric_function = multiClassSummary, 
  perf_metric_name = 'AUC', 
  class_probs = TRUE, 
  method = 'rf', 
  seed = 222
)

The error that shows up while running the above function is

Error in calc_perf_metrics(test_data, trained_model, outcome_colname,  : subscript out of bounds

If I change the outcome/dependent 'Species' variable to character in the test subset, the same error doesn't happen.

# Random Forest with Outcome 'Species' as Character Variable in the test dataset

# Copying the dataframe
test.data.character <- test.data

# Converting Outcome into Character Vector
test.data.character$Species <- as.character(test.data.character$Species)

# Model Training
rf.outcome.character <- train(Species~., data = train.data, method = 'rf', trControl = train.control)

# Get Feature Importance Permutation Testing with Species Outcome as Character Vector in the test subset
rf.feature.imp.character <- get_feature_importance(
  rf.outcome.character,
  test.data.character,
  outcome_colname = 'Species', 
  perf_metric_function = multiClassSummary, 
  perf_metric_name = 'AUC', 
  class_probs = TRUE, 
  method = 'rf', 
  seed = 222
)

print(rf.feature.imp.character)

  perf_metric perf_metric_diff     pvalue     lower     upper         feat method perf_metric_name seed
1   0.8800324     0.1199675702 0.00990099 0.8163617 0.9270562 Petal.Length     rf              AUC  222
2   0.8382760     0.1617240008 0.00990099 0.7478376 0.8991241  Petal.Width     rf              AUC  222
3   0.9998587     0.0001413399 0.92079208 0.9974302 1.0000000 Sepal.Length     rf              AUC  222
4   0.9998330     0.0001670380 0.94059406 0.9961453 1.0000000  Sepal.Width     rf              AUC  222

Am I doing something wrong here, or is the code for get_feature_importance( ) only takes outcome as a character variable by design? Looking forward to the clarification!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transforming the Factor Outcome Variable as Characters #344

{{title}}

Replies: 0 comments

Select a reply

Transforming the Factor Outcome Variable as Characters #344

RishBez Jul 16, 2024

Replies: 0 comments

RishBez
Jul 16, 2024