Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLDR Example Fails #204

Open
mahiki opened this issue Jul 28, 2024 · 9 comments
Open

TLDR Example Fails #204

mahiki opened this issue Jul 28, 2024 · 9 comments

Comments

@mahiki
Copy link

mahiki commented Jul 28, 2024

I'm new to all the packages referenced here, sorry I can't diagnose.

Environment:

julia version 1.10.4


cat Project.toml
[deps]
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
Term = "22787eb5-b846-44ae-b979-8e399b8463ab"
XGBoost = "009559a3-9522-5dbb-924b-0b6ed2b22bb9"

pkg> st
Status ...etc
  [a93c6f00] DataFrames v1.6.1
  [22787eb5] Term v2.0.6
  [009559a3] XGBoost v2.5.1

I tried out the TLDR example in the docs:

using DataFrames, XGBoost, Term

# training set of 100 datapoints of 4 features
(X, y) = (randn(100,4), randn(100))

# create and train a gradient boosted tree model of 5 trees
bst = xgboost((X, y), num_round=5, max_depth=6, objective="reg:squarederror")

# obtain model predictions= predict(bst, X)

df = DataFrame(randn(100,3), [:a, :b, :y])

# can accept tabular data, will keep feature names
bst = xgboost((df[!, [:a, :b]], df.y))

# display importance statistics retaining feature names
importancereport(bst)

╭───────────┬────────────┬──────────┬───────────┬──────────────┬───────────────╮
│  feature  │    gain    │  weight  │   cover   │  total_gain  │  total_cover  │
├───────────┼────────────┼──────────┼───────────┼──────────────┼───────────────┤
│    "b"1.1655382.034.707395.57352846.0     │
├───────────┼────────────┼──────────┼───────────┼──────────────┼───────────────┤
│    "a"0.918156108.026.46399.16082858.0     │
╰───────────┴────────────┴──────────┴───────────┴──────────────┴───────────────╯

So far so good

trees(bst)

#... stack trace in next comment
@mahiki
Copy link
Author

mahiki commented Jul 28, 2024

julia> trees(bst)
10-element Vector{XGBoost.Node}:
 Error showing value of type Vector{XGBoost.Node}:
ERROR: MethodError: no method matching print_tree(::typeof(Term.Trees.print_node), ::typeof(Term.Trees.print_key), ::IOBuffer, ::OrderedCollections.OrderedDict{…}; charset::AbstractTrees.TreeCharSet, printkeys::Bool, prefix::String, title_style::String)

Closest candidates are:
  print_tree(::Function, ::Function, ::IO, ::Any; maxdepth, indicate_truncation, charset, printkeys, depth, prefix, printnode_kw) got unsupported keyword argument "title_style"
   @ AbstractTrees ~/.julia/packages/AbstractTrees/Ftf8W/src/printing.jl:194
  print_tree(::Function, ::IO, ::Any; kw...)
   @ AbstractTrees ~/.julia/packages/AbstractTrees/Ftf8W/src/printing.jl:276
  print_tree(::Any; kw...)
   @ AbstractTrees ~/.julia/packages/AbstractTrees/Ftf8W/src/printing.jl:278
  ...

Stacktrace:
  [1] kwerr(::@NamedTuple{}, ::Function, ::Function, ::Function, ::IOBuffer, ::OrderedCollections.OrderedDict{…})
    @ Base ./error.jl:165
  [2] (::Term.Trees.var"#4#5"{AbstractTrees.TreeCharSet, Bool, typeof(Term.Trees.print_node), typeof(Term.Trees.print_key), String, @Kwargs{title_style::String}})(io::IOBuffer)
    @ Term.Trees ~/.julia/packages/Term/u4VQK/src/trees.jl:147
  [3] sprint(::Function; context::Nothing, sizehint::Int64)
    @ Base ./strings/io.jl:114
  [4] sprint
    @ ./strings/io.jl:107 [inlined]
  [5] Term.Trees.Tree(tree::OrderedCollections.OrderedDict{…}; guides::Symbol, theme::Term.Theme, printkeys::Bool, print_node_function::Function, print_key_function::Function, title::String, prefix::String, kwargs::@Kwargs{})
    @ Term.Trees ~/.julia/packages/Term/u4VQK/src/trees.jl:146
  [6] Term.Trees.Tree(node::XGBoost.Node; title::String, title_style::String, kwargs::@Kwargs{})
    @ XGBoostTermExt ~/.julia/packages/XGBoost/nqMqQ/ext/XGBoostTermExt.jl:71
  [7] Tree
    @ ~/.julia/packages/XGBoost/nqMqQ/ext/XGBoostTermExt.jl:65 [inlined]
  [8] Panel(node::XGBoost.Node; width::Int64, kw::@Kwargs{})
    @ XGBoostTermExt ~/.julia/packages/XGBoost/nqMqQ/ext/XGBoostTermExt.jl:81
  [9] Panel
    @ ~/.julia/packages/XGBoost/nqMqQ/ext/XGBoostTermExt.jl:74 [inlined]
 [10] show
    @ ~/.julia/packages/XGBoost/nqMqQ/ext/XGBoostTermExt.jl:93 [inlined]
 [11] show(io::IOContext{IOBuffer}, m::String, x::XGBoost.Node)
    @ Base.Multimedia ./multimedia.jl:123
 [12] sprint(::Function, ::String, ::Vararg{Any}; context::IOContext{Base.TTY}, sizehint::Int64)
    @ Base ./strings/io.jl:112
 [13] sprint
    @ ./strings/io.jl:107 [inlined]
 [14] print_matrix_row(io::IOContext{Base.TTY}, X::AbstractVecOrMat, A::Vector{Tuple{Int64, Int64}}, i::Int64, cols::Vector{Int64}, sep::String, idxlast::Int64)
    @ Base ./arrayshow.jl:108
 [15] _print_matrix(io::IOContext{…}, X::AbstractVecOrMat, pre::String, sep::String, post::String, hdots::String, vdots::String, ddots::String, hmod::Int64, vmod::Int64, rowsA::UnitRange{…}, colsA::UnitRange{…})
    @ Base ./arrayshow.jl:213
 [16] print_matrix(io::IOContext{Base.TTY}, X::Vector{XGBoost.Node}, pre::String, sep::String, post::String, hdots::String, vdots::String, ddots::String, hmod::Int64, vmod::Int64)
    @ Base ./arrayshow.jl:171
 [17] print_matrix
    @ ./arrayshow.jl:171 [inlined]
 [18] print_array
    @ ./arrayshow.jl:358 [inlined]
 [19] show(io::IOContext{Base.TTY}, ::MIME{Symbol("text/plain")}, X::Vector{XGBoost.Node})
    @ Base ./arrayshow.jl:399
 [20] (::REPL.var"#55#56"{REPL.REPLDisplay{REPL.LineEditREPL}, MIME{Symbol("text/plain")}, Base.RefValue{Any}})(io::Any)
    @ REPL ~/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/REPL/src/REPL.jl:273
 [21] with_repl_linfo(f::Any, repl::REPL.LineEditREPL)
    @ REPL ~/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/REPL/src/REPL.jl:569
 [22] display(d::REPL.REPLDisplay, mime::MIME{Symbol("text/plain")}, x::Any)
    @ REPL ~/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/REPL/src/REPL.jl:259
 [23] display
    @ ~/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/REPL/src/REPL.jl:278 [inlined]
 [24] display(x::Any)
    @ Base.Multimedia ./multimedia.jl:340
 [25] print_response(errio::IO, response::Any, show_value::Bool, have_color::Bool, specialdisplay::Union{Nothing, AbstractDisplay})
    @ REPL ~/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/REPL/src/REPL.jl:0
 [26] (::REPL.var"#57#58"{REPL.LineEditREPL, Pair{Any, Bool}, Bool, Bool})(io::Any)
    @ REPL ~/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/REPL/src/REPL.jl:284
 [27] with_repl_linfo(f::Any, repl::REPL.LineEditREPL)
    @ REPL ~/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/REPL/src/REPL.jl:569
 [28] print_response(repl::REPL.AbstractREPL, response::Any, show_value::Bool, have_color::Bool)
    @ REPL ~/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/REPL/src/REPL.jl:282
 [29] (::REPL.var"#do_respond#80"{Bool, Bool, REPL.var"#93#103"{REPL.LineEditREPL, REPL.REPLHistoryProvider}, REPL.LineEditREPL, REPL.LineEdit.Prompt})(s::REPL.LineEdit.MIState, buf::Any, ok::Bool)
    @ REPL ~/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/REPL/src/REPL.jl:911
 [30] #invokelatest#2
    @ ./essentials.jl:892 [inlined]
 [31] invokelatest
    @ ./essentials.jl:889 [inlined]
 [32] run_interface(terminal::REPL.Terminals.TextTerminal, m::REPL.LineEdit.ModalInterface, s::REPL.LineEdit.MIState)
    @ REPL.LineEdit ~/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/REPL/src/LineEdit.jl:2656
 [33] run_frontend(repl::REPL.LineEditREPL, backend::REPL.REPLBackendRef)
    @ REPL ~/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/REPL/src/REPL.jl:1312
 [34] (::REPL.var"#62#68"{REPL.LineEditREPL, REPL.REPLBackendRef})()
    @ REPL ~/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/REPL/src/REPL.jl:386
Some type information was truncated. Use `show(err)` to see complete types.

@mahiki
Copy link
Author

mahiki commented Jul 29, 2024

Looks like failure from Term.jl dependency

Note that tree(bst) from the TLDR succeeds if I don't import Term:

using DataFrames, XGBoost
    # ...etc

# expected
julia> importancereport(bst)
ERROR: MethodError: no method matching importancereport(::Booster)

julia> importance(bst)
    # OrderedCollections.OrderedDict{String, Vector{Float32}} with 2 entries:
    # "a" => [0.709936]
    # "b" => [0.644888]

julia> trees(bst)
    # 10-element Vector{XGBoost.Node}:
    # XGBoost.Node(split_feature=b)
    # XGBoost.Node(split_feature=b)
    # XGBoost.Node(split_feature=b)
    # XGBoost.Node(split_feature=b)
    # XGBoost.Node(split_feature=b)
    # XGBoost.Node(split_feature=b)
    # XGBoost.Node(split_feature=b)
    # XGBoost.Node(split_feature=a)
    # XGBoost.Node(split_feature=b)
    # XGBoost.Node(split_feature=a)

julia> for i in propertynames(ts[1])
           @printf "%-20s    %12s\n" i getproperty(ts[1], i)
       end
id                                 0
depth                              0
split                              b
split_condition           1.22669697
yes                                1
no                                 2
nmissing                           2
gain                      4.72399569
cover                          100.0
leaf                         nothing
children                XGBoost.Node[XGBoost.Node(split_feature=a), XGBoost.Node(split_feature=b)]

@mahiki
Copy link
Author

mahiki commented Jul 29, 2024

It really dinged my confidence when the Hello World for this package failed, and I messed around with versions trying to find an environment where it works -- wasted time.

I suggest either:

  • add a test for the whole TLDR section to catch this kind of failure before release
  • Or, if there isn't bandwidth to manage this level of user experience, let's refactor the TLDR to remove the Term dependency.

Or maybe someone experienced can see an easy fix! Let me know I'm happy to PR either of the above.

@ExpandingMan
Copy link
Collaborator

This is either an error in Term.jl in that it's passing keyword arguments to print_tree that it shouldn't, or an error in XGBoostTermExt in that it is creating a Term.Tree with arguments that can't be passed into print_tree. Either way, the Term tree output doesn't seem to be making a lot of sense now even when this is fixed by removing those arguments.

We probably should remove the term tree visualizations as that doesn't really seem like something we'll want to maintain and it was perhaps a bit overenthusiastic to put it there in the first place.

@mahiki
Copy link
Author

mahiki commented Jul 29, 2024

Makes sense, something fragile about this dependency reminiscent of a 🔥 julia discourse not long ago.

Is tree meant to be a 'show' type of function, ancillary to the model in production?

Visualization features don't need to be in scope, although the explainability of tree models is one of the big benefits. Package users can manage that outside of this package, maybe just a good how-to example in the docs is a good tradeoff.

I'm eager to help on this, esp. having been useless in helping on Parquet2.jl. I expect to be using this package a lot.

Edit: a little guidance on general direction to go and I'd be able to make a PR

@ExpandingMan
Copy link
Collaborator

This wrapper defines a Node object that's supposed to aid users with introspection of the trees, as the library itself provides text output. Visualization of that kind of stuff is admittedly a hard problem. If we cut down on the visualization stuff, users should still be able to find out whatever they need with Node, it just may be a little more work.

If you want to remove all of the Node methods from the Term extension, I personally would be happy with that and would be willing to merge it. I can't speak for anyone else, so if you do make such a PR I'll want to leave it up for a few days to give people a chance to comment.

Other solutions are also welcome, but yeah I think in retrospect adding this kind of fluff to this simple wrapper package was not the greatest idea. Node should definitely stay since introspection of the trees is a pain and the help is badly needed, but trying to get the whole thing to display nicely in the general case is a big complicated problem that we shouldn't try to solve.

There is also the fraught issue of whether it would be a breaking change. I always think display stuff should not be considered part of the API and therefore not breaking, but we don't make it clear, it might have to be, so I guess it just is what it is.

@ablaom
Copy link
Contributor

ablaom commented Jul 30, 2024

I always think display stuff should not be considered part of the API and therefore not breaking

I vote for non-breaking.

@mahiki
Copy link
Author

mahiki commented Jul 30, 2024

Sorry I've been reading through the package and just learned about package extensions.

Tell me if I have this right:

  • Julia REPL uses the show method on the result of every entry.
  • The package extension automatically includes the two extension modules which specializes the show method for Nodes.
  • The show method is failing because of some change where AbstractTrees.print_tree is being called on Term.Trees.print_node instead of a function or ::IO type
    • also print_tree is getting unsupported keyword argument "title_style"

So I'd say

  1. getting a stack trace on the display call in REPL is not great, maybe add error handling and fall back to Base.show?
  2. Maybe this could be fixed with a small tweak to XGBoostTermExt? That would be preferable I think.

First thing I'd like to do is add a hint to the MethodError for instancereport following this example, that would have helped me a lot as a first time user.

@ExpandingMan
Copy link
Collaborator

I think you have the first part of it right, but not the cause of the problem. The method error happens because Term.jl is passing the keyword arguments to its constructors to print_tree which does not in turn accept them. The method error was entirely appropriate, given the circumstances.

Of course it's possible to fix this but I'm not even sure what the intended behavior is, so I'm not sure whether this is a case of Term.jl misusing print_tree or a case of XGBoostTermExt misusing the Term.jl tree constructor. Either way, I'm inclined to agree with what I thought was the sentiment of your earlier post, i.e. displaying these trees nicely is a rather hard problem to solve in general and we shouldn't try to do it here. Looking at this a bit more broadly, we don't even have any real confidence that these trees will display nicely in a wide variety of cases even were it not for this error.

That said, whether you are inclined to fix this issue and restore the original behavior, or just remove the Term display for tree nodes, I'd vote to merge either case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants