Accessing Considered Splits and Gains During Tree Construction #6667

JohnPaulR · 2024-10-08T16:52:48Z

Summary

I would like to request a feature that enables the logging or extraction of all features considered for splits at each node during tree construction, along with their associated gain values (or impurity reductions). The goal is to access not only the best split but also the alternatives that were evaluated in order to identify and potentially trim variables that are essentially duplicative in their contribution to the model.

Motivation

This feature would help modelers better understand how LightGBM is considering features during tree-building. It could be particularly useful for feature engineering and model optimization, as it would allow practitioners to detect features that often compete for splits, meaning they are highly correlated or duplicative in their predictive power. By identifying such variables, it would be possible to simplify models, reduce dimensionality, and improve model interpretability without sacrificing accuracy.

Description

I propose adding functionality to LightGBM that would:

Log or expose all considered features and split thresholds at each node, not just the selected split.
Capture the gain (or impurity reduction) for all potential splits, allowing users to see which features were close competitors in terms of gain.
This feature could be made accessible through a custom callback, an internal API hook, or a configurable parameter that enables detailed logging of splits during model training.

This could be useful for:

Model optimization: Trimming redundant variables that offer little marginal value compared to similar features.
Feature selection: Understanding which variables frequently compete for splits can aid in feature selection or combination.
Model interpretability: Providing insights into the decision-making process of the algorithm, beyond just the final tree structure.

If this feature already exists or can be achieved through custom means (such as callbacks or hooks), please provide guidance on how to implement it.

jameslamb added the question label Oct 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accessing Considered Splits and Gains During Tree Construction #6667

Accessing Considered Splits and Gains During Tree Construction #6667

JohnPaulR commented Oct 8, 2024

Accessing Considered Splits and Gains During Tree Construction #6667

Accessing Considered Splits and Gains During Tree Construction #6667

Comments

JohnPaulR commented Oct 8, 2024

Summary

Motivation

Description