Extracting weights from 100 iterations and from random random forset model #316

Hesham999666 · 2022-11-22T12:09:03Z

Hesham999666
Nov 22, 2022

Hi there, first of all thanks a lot for developing mikropml package it made machine learning with microbiome data easier for people with limited bioinformatic experience.

I do have two questions and would be great if you could help.

Is there is way to extract the weight from 100 iterations so I can calculate the median and IQR ?

I manged to extract the weight from one iteration only using the following code:

feno_results_glment <- run_ml(preprocessed_feno_con, method = "glmnet",
outcome_colname = "feno", seed = 1990)

model <- feno_results_glment %>% _pluck("trained_model")

data <- coef(model$finalModel, model$bestTune$lambda) %>% as.matrix() %>%
as_tibble(rownames = "feature" ) %>% rename(weight = s1)

When use the below code I can't get weight from 100 iterations

_test_hp <- list(alpha = 0, lambda = c(0.05, 0.1,1,2,3))

get_feno_results_hp <- function(seed){

run_ml(preprocessed_feno, method = "glmnet",
outcome_colname = "feno",
seed = seed)
}
plan ("multisession", workers= 4)

iterative_run_ml_100_splits <- future_map(1:100, get_feno_results_hp,
.options = furrr_options(seed= TRUE))_

Is it possible to extract the weight from the Random forest model and How ?

Answered by kelly-sovacool

Nov 28, 2022

Extracting coefficients from multiple iterations of logistic regression

Here's a full reprex using the built-in otu_mini_bin dataset and only 3 random seeds (so it runs quickly).

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(future)
library(furrr)
library(mikropml)
plan("multisession", workers = 4)

test_hp <- list(alpha = 0, lambda = c(0.05, 0.1, 1, 2, 3))
get_results <- function(seed) {
    run_ml(
        otu_mini_bin,
        method = "glmnet",
        outcome_colname = "dx",
        seed = seed

View full answer

kelly-sovacool · 2022-11-28T15:04:41Z

kelly-sovacool
Nov 28, 2022
Maintainer

Extracting coefficients from multiple iterations of logistic regression

Here's a full reprex using the built-in otu_mini_bin dataset and only 3 random seeds (so it runs quickly).

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(future)
library(furrr)
library(mikropml)
plan("multisession", workers = 4)

test_hp <- list(alpha = 0, lambda = c(0.05, 0.1, 1, 2, 3))
get_results <- function(seed) {
    run_ml(
        otu_mini_bin,
        method = "glmnet",
        outcome_colname = "dx",
        seed = seed,
        hyperparameters = test_hp
    )
}
nseeds <- 3
iterative_run_ml_100_splits <- future_map(1:nseeds, 
                                          get_results,
                                          .options = furrr_options(seed = TRUE))
#> Using 'dx' as the outcome column.
#> Training the model...
#> Loading required package: ggplot2
#> Loading required package: lattice
#> 
#> Attaching package: ‘caret’
#> The following object is masked from ‘package:mikropml’:
#> 
#>     compare_models
#> The following object is masked from ‘package:purrr’:
#> 
#>     lift
#> Warning in (function (w) : `caret::train()` issued the following warning:
#>  
#> simpleWarning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, : There were missing values in resampled performance measures.
#> 
#> This warning usually means that the model didn't converge in some cross-validation folds because it is predicting something close to a constant. As a result, certain performance metrics can't be calculated. This suggests that some of the hyperparameters chosen are doing very poorly.
#> Training complete.
#> Using 'dx' as the outcome column.
#> Training the model...
#> Loading required package: ggplot2
#> Loading required package: lattice
#> 
#> Attaching package: ‘caret’
#> The following object is masked from ‘package:mikropml’:
#> 
#>     compare_models
#> The following object is masked from ‘package:purrr’:
#> 
#>     lift
#> Training complete.
#> Using 'dx' as the outcome column.
#> Training the model...
#> Loading required package: ggplot2
#> Loading required package: lattice
#> 
#> Attaching package: ‘caret’
#> The following object is masked from ‘package:mikropml’:
#> 
#>     compare_models
#> The following object is masked from ‘package:purrr’:
#> 
#>     lift
#> Warning in (function (w) : `caret::train()` issued the following warning:
#>  
#> simpleWarning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, : There were missing values in resampled performance measures.
#> 
#> This warning usually means that the model didn't converge in some cross-validation folds because it is predicting something close to a constant. As a result, certain performance metrics can't be calculated. This suggests that some of the hyperparameters chosen are doing very poorly.
#> Training complete.

get_coef <- function(ml_result) {
    seed <- ml_result$performance$seed
    model <- ml_result$trained_model
    coef(model$finalModel, model$bestTune$lambda) %>% 
        as.matrix() %>%
        as_tibble(rownames = "feature" ) %>% 
        rename(weight = s1) %>% 
        mutate(seed = seed)
}

iterative_coefs <- future_map_dfr(iterative_run_ml_100_splits, get_coef)

^{Created on 2022-11-28 with reprex v2.0.2}

Extracting coefficients from random forest

The random forest algorithm is not a regression method and thus does not have coefficients. An alternative method for interpreting random forest models is to use permutation feature importance, which we implement (see http://www.schlosslab.org/mikropml/articles/introduction.html#finding-feature-importance and http://www.schlosslab.org/mikropml/reference/get_feature_importance.html). A benefit of permutation feature importance is it doesn't matter which ML method you use.

1 reply

Hesham999666 Nov 29, 2022
Author

Thanks for the script

I run on my data and I was able to reproduce the the results I wanted 👍

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extracting weights from 100 iterations and from random random forset model #316

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Extracting weights from 100 iterations and from random random forset model #316

Hesham999666 Nov 22, 2022

Extracting coefficients from multiple iterations of logistic regression

Replies: 1 comment · 1 reply

kelly-sovacool Nov 28, 2022 Maintainer

Extracting coefficients from multiple iterations of logistic regression

Extracting coefficients from random forest

Hesham999666 Nov 29, 2022 Author

Hesham999666
Nov 22, 2022

Replies: 1 comment 1 reply

kelly-sovacool
Nov 28, 2022
Maintainer

Hesham999666 Nov 29, 2022
Author