Performance regression in fit method with evaluation sets #10793

ldesreumaux · 2024-09-01T15:38:47Z

I have observed a significant performance regression in XGBoost version 1.7 when using the fit method with evaluation sets in sklearn estimators. The issue appears to have been introduced by this commit, which defaults to using QuantileDMatrix for both training and evaluation sets.

While the optimization of prediction with QuantileDMatrix has been addressed in #9013, there remains a significant performance gap when using QuantileDMatrix for evaluation sets compared to DMatrix.

Here is a sample code to reproduce the issue:

import numpy as np
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score
import time

n_samples = 1000000
n_features = 100
seed = 42

np.random.seed(seed)

X = np.random.rand(n_samples, n_features)
y = np.random.randint(0, 2, size=n_samples)

X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.4, random_state=seed)
X_eval1, X_eval2, y_eval1, y_eval2 = train_test_split(X_temp, y_temp, test_size=0.5, random_state=seed)

model = XGBClassifier(
    tree_method='hist',
    max_depth=6,
    n_estimators=500,
    eval_metric='logloss',
    random_state=seed
)

start_time = time.time()

model.fit(X_train, y_train, eval_set=[(X_eval1, y_eval1), (X_eval2, y_eval2)], verbose=True)

end_time = time.time()
execution_time = end_time - start_time

y_pred_eval1 = model.predict(X_eval1)
y_pred_eval2 = model.predict(X_eval2)

accuracy_eval1 = accuracy_score(y_eval1, y_pred_eval1)
accuracy_eval2 = accuracy_score(y_eval2, y_pred_eval2)

print(f"Accuracy on Evaluation Set 1: {accuracy_eval1:.4f}")
print(f"Accuracy on Evaluation Set 2: {accuracy_eval2:.4f}")

print(f"Execution Time: {execution_time:.2f} seconds")

Performance comparison (with current master branch):

With QuantileDMatrix: 66.13 seconds
With DMatrix: 36.35 seconds

Here are profiling graphs for the two cases:

The graphs clearly show that the performance degradation is linked to the prediction step with QuantileDMatrix for evaluation sets.

This sample code uses synthetic data, but I have observed the same order of magnitude of performance degradation with a real-world dataset.

If no further optimization is possible, I would suggest to change the default behavior to use a simple DMatrix for the evaluation sets.

trivialfis · 2024-09-01T18:12:49Z

I agree that the gap is unexpectedly large. The choice of QDM is for reduced memory usage as it compresses the data. But there's a cost in data lookup during prediction. I will try to see what can be done there. Maybe use in place predict, maybe optimize the value lookup a bit more.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance regression in fit method with evaluation sets #10793

Performance regression in fit method with evaluation sets #10793

ldesreumaux commented Sep 1, 2024

trivialfis commented Sep 1, 2024

Performance regression in fit method with evaluation sets #10793

Performance regression in fit method with evaluation sets #10793

Comments

ldesreumaux commented Sep 1, 2024

trivialfis commented Sep 1, 2024