-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial support for Catboost #377
Comments
Closed
Prototype for a working inference engine is available here: https://github.com/hcho3/catboost_python_repro |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
We would like to add support for Catboost models. Users of Treelite should be able to load Catboost models and run prediction.
Overview
Catboost has a custom target encoding method to encode categorical data, and produces special kinds of decision trees called oblivious trees. See the Catboost paper for more details.
In general, target encoder is a function that takes a categorical input and puts out a numeric output. The function is an "encoding," in the sense that the categorical input is encoded as a real number. The advantage of target encoding is that we can exclusively use the simple test of form
[feature] < [threshold]
in all of our decision trees.The challenge is that Catboost uses a custom flavor of target encoding. The goal, therefore, is to abstract away as much complexity as possible.
Proposed Design
The treelite model spec
treelite/include/treelite/tree.h
Lines 792 to 796 in 4cc4f7e
should be updated to include an optional field to store the target encoding function. The target encoding component should be a lookup table of form
where each possible categorical value is mapped to a vector of length 1 or greater.
Catboost uses CityHash to convert string categories into int64, so the target encoding field must allow both int64 and float32 types for the categorical input.
Scope
Catboost allows users to save models in two formats: FlatBuffer and JSON. For the initial version, we'll only support the JSON format.
Initially, we'll convert oblivious trees into regular decision trees. We may add
ObliviousTree
class to the Treelite model spec in the future.In addition, we'll only support the
simple_ctr
configuration, where the target encoding function takes in only one single categorical feature at a time. We won't support thecombination_ctr
configuration where multiple categorical features are fed into the target encoder.TODOs
src/frontend
.The text was updated successfully, but these errors were encountered: