Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ranger produces list of trees #84

Open
ndiquattro opened this issue Aug 23, 2020 · 1 comment
Open

Ranger produces list of trees #84

ndiquattro opened this issue Aug 23, 2020 · 1 comment
Labels
documentation feature a feature request or enhancement

Comments

@ndiquattro
Copy link

Hello, thanks for your work on this package, it is very exciting! I was trying to to follow the docs on using a ranger RF model, but it seems to return a list of trees/case_whens rather than one statement. Is it intended we execute all the trees on the DB then calculate the prediction from the results? I don't get that impression from the docs. Thanks!

library(ranger)
library(tidypredict)
library(dplyr, warn.conflicts = FALSE)

test_mod <- ranger(Species ~ ., iris, num.trees = 100)

trees <- tidypredict_fit(test_mod)

# Is list of trees
str(trees, max.level = 1, list.len = 3)
#> List of 100
#>  $ : language case_when(Petal.Width < 0.8 ~ "setosa", Sepal.Length < 5.75 & Petal.Width >=      0.8 ~ "versicolor", Petal.Width| __truncated__ ...
#>  $ : language case_when(Petal.Length < 2.45 ~ "setosa", Petal.Width >= 1.7 & Petal.Length >=      2.45 ~ "virginica", Petal.Len| __truncated__ ...
#>  $ : language case_when(Petal.Width < 0.8 ~ "setosa", Petal.Length < 4.9 & Petal.Width <      1.75 & Petal.Width >= 0.8 ~ "vers| __truncated__ ...
#>   [list output truncated]

# One example
trees[[1]]
#> case_when(Petal.Width < 0.8 ~ "setosa", Sepal.Length < 5.75 & 
#>     Petal.Width >= 0.8 ~ "versicolor", Petal.Width >= 1.75 & 
#>     Sepal.Length >= 5.75 & Petal.Width >= 0.8 ~ "virginica", 
#>     Petal.Length < 4.75 & Sepal.Width < 2.25 & Petal.Width < 
#>         1.75 & Sepal.Length >= 5.75 & Petal.Width >= 0.8 ~ "versicolor", 
#>     Petal.Length >= 4.75 & Sepal.Width < 2.25 & Petal.Width < 
#>         1.75 & Sepal.Length >= 5.75 & Petal.Width >= 0.8 ~ "virginica", 
#>     Petal.Width < 1.55 & Sepal.Width >= 2.25 & Petal.Width < 
#>         1.75 & Sepal.Length >= 5.75 & Petal.Width >= 0.8 ~ "versicolor", 
#>     Petal.Width >= 1.65 & Petal.Width >= 1.55 & Sepal.Width >= 
#>         2.25 & Petal.Width < 1.75 & Sepal.Length >= 5.75 & Petal.Width >= 
#>         0.8 ~ "versicolor", Petal.Length < 5.45 & Petal.Width < 
#>         1.65 & Petal.Width >= 1.55 & Sepal.Width >= 2.25 & Petal.Width < 
#>         1.75 & Sepal.Length >= 5.75 & Petal.Width >= 0.8 ~ "versicolor", 
#>     Petal.Length >= 5.45 & Petal.Width < 1.65 & Petal.Width >= 
#>         1.55 & Sepal.Width >= 2.25 & Petal.Width < 1.75 & Sepal.Length >= 
#>         5.75 & Petal.Width >= 0.8 ~ "virginica")

# Suggested by old issue doesn't work
iris %>%
  tidypredict_to_column(test_mod)
#> Error in tidypredict_to_column(., test_mod): tidypredict_to_column does not support tree based models

Created on 2020-08-23 by the reprex package (v0.3.0)

@topepo topepo added documentation feature a feature request or enhancement labels Dec 4, 2020
@topepo
Copy link
Member

topepo commented Dec 4, 2020

I looked at the documentation and agree that it needs to be revised.

I think that the intention was to do some dplyr work to get the predictions in the format that you might want.

Here's some code that uses dplyr, purrr, and tidyr:

library(ranger)
library(tidypredict)
library(dplyr, warn.conflicts = FALSE)

test_mod <- ranger(Species ~ ., iris, num.trees = 100)

trees <- tidypredict_fit(test_mod)

new_samples <- iris[c(1, 51, 101), ]

votes <- 
 purrr:::map_dfr(trees, 
                 ~ tibble(.pred = rlang::eval_tidy(.x, new_samples),
                          .row = 1:nrow(new_samples)
                 )
 )

class_pred <-
 votes %>% 
 group_by(.row) %>% 
 count(.pred) %>% 
 slice_max(n) %>% 
 ungroup() %>% 
 select(-n)

class_pred
#> # A tibble: 3 x 2
#>    .row .pred     
#>   <int> <chr>     
#> 1     1 setosa    
#> 2     2 versicolor
#> 3     3 virginica

class_prob <- 
 votes %>% 
 group_by(.row) %>% 
 count(.pred) %>% 
 mutate(prob = n/100) %>% 
 ungroup() %>% 
 select(-n) %>% 
 tidyr::pivot_wider(id_cols = ".row", names_from = ".pred", values_from = "prob", values_fill = 0)

class_prob
#> # A tibble: 3 x 4
#>    .row setosa versicolor virginica
#>   <int>  <dbl>      <dbl>     <dbl>
#> 1     1      1       0         0   
#> 2     2      0       0.98      0.02
#> 3     3      0       0         1

Created on 2020-12-04 by the reprex package (v0.3.0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants