Extracting weights from 100 iterations and from random random forset model #316
-
Hi there, first of all thanks a lot for developing mikropml package it made machine learning with microbiome data easier for people with limited bioinformatic experience. I do have two questions and would be great if you could help. Is there is way to extract the weight from 100 iterations so I can calculate the median and IQR ? I manged to extract the weight from one iteration only using the following code: feno_results_glment <- run_ml(preprocessed_feno_con, method = "glmnet", model <- feno_results_glment %>% _pluck("trained_model") data <- coef(model$finalModel, model$bestTune$lambda) %>% as.matrix() %>% When use the below code I can't get weight from 100 iterations _test_hp <- list(alpha = 0, lambda = c(0.05, 0.1,1,2,3)) get_feno_results_hp <- function(seed){ run_ml(preprocessed_feno, method = "glmnet", iterative_run_ml_100_splits <- future_map(1:100, get_feno_results_hp, Is it possible to extract the weight from the Random forest model and How ? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Extracting coefficients from multiple iterations of logistic regressionHere's a full reprex using the built-in library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(future)
library(furrr)
library(mikropml)
plan("multisession", workers = 4)
test_hp <- list(alpha = 0, lambda = c(0.05, 0.1, 1, 2, 3))
get_results <- function(seed) {
run_ml(
otu_mini_bin,
method = "glmnet",
outcome_colname = "dx",
seed = seed,
hyperparameters = test_hp
)
}
nseeds <- 3
iterative_run_ml_100_splits <- future_map(1:nseeds,
get_results,
.options = furrr_options(seed = TRUE))
#> Using 'dx' as the outcome column.
#> Training the model...
#> Loading required package: ggplot2
#> Loading required package: lattice
#>
#> Attaching package: ‘caret’
#> The following object is masked from ‘package:mikropml’:
#>
#> compare_models
#> The following object is masked from ‘package:purrr’:
#>
#> lift
#> Warning in (function (w) : `caret::train()` issued the following warning:
#>
#> simpleWarning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, : There were missing values in resampled performance measures.
#>
#> This warning usually means that the model didn't converge in some cross-validation folds because it is predicting something close to a constant. As a result, certain performance metrics can't be calculated. This suggests that some of the hyperparameters chosen are doing very poorly.
#> Training complete.
#> Using 'dx' as the outcome column.
#> Training the model...
#> Loading required package: ggplot2
#> Loading required package: lattice
#>
#> Attaching package: ‘caret’
#> The following object is masked from ‘package:mikropml’:
#>
#> compare_models
#> The following object is masked from ‘package:purrr’:
#>
#> lift
#> Training complete.
#> Using 'dx' as the outcome column.
#> Training the model...
#> Loading required package: ggplot2
#> Loading required package: lattice
#>
#> Attaching package: ‘caret’
#> The following object is masked from ‘package:mikropml’:
#>
#> compare_models
#> The following object is masked from ‘package:purrr’:
#>
#> lift
#> Warning in (function (w) : `caret::train()` issued the following warning:
#>
#> simpleWarning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, : There were missing values in resampled performance measures.
#>
#> This warning usually means that the model didn't converge in some cross-validation folds because it is predicting something close to a constant. As a result, certain performance metrics can't be calculated. This suggests that some of the hyperparameters chosen are doing very poorly.
#> Training complete.
get_coef <- function(ml_result) {
seed <- ml_result$performance$seed
model <- ml_result$trained_model
coef(model$finalModel, model$bestTune$lambda) %>%
as.matrix() %>%
as_tibble(rownames = "feature" ) %>%
rename(weight = s1) %>%
mutate(seed = seed)
}
iterative_coefs <- future_map_dfr(iterative_run_ml_100_splits, get_coef) Created on 2022-11-28 with reprex v2.0.2 Extracting coefficients from random forestThe random forest algorithm is not a regression method and thus does not have coefficients. An alternative method for interpreting random forest models is to use permutation feature importance, which we implement (see http://www.schlosslab.org/mikropml/articles/introduction.html#finding-feature-importance and http://www.schlosslab.org/mikropml/reference/get_feature_importance.html). A benefit of permutation feature importance is it doesn't matter which ML method you use. |
Beta Was this translation helpful? Give feedback.
Extracting coefficients from multiple iterations of logistic regression
Here's a full reprex using the built-in
otu_mini_bin
dataset and only 3 random seeds (so it runs quickly).