Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] fix slow predict-caching with many classes #3109

Merged
merged 4 commits into from
Feb 16, 2018

Conversation

khotilov
Copy link
Member

Addresses the issue of the O(num_class^2) prediction cache behavior #1689, #2926 where cache updates within CommitModel may start dominating with many classes. The reason was that the cache was updated for each single class group commit in a boosting round. E.g., with

set.seed(1)
n <- 1e4
num_feat <- 200
num_class <- 150
y <- apply(rmultinom(n, 1, rep(1, num_class)), 2, function(yy) which(yy != 0)) - 1
dtr <- xgb.DMatrix(matrix(rnorm(n*num_feat), n, num_feat), label = y)
param <- list(objective='multi:softprob', num_class=num_class, debug_verbose=1,
              tree_method='hist', subsample=0.6)
bst <- xgb.train(param, data=dtr, nrounds=1)
rm(bst)
gc()

the timing result before (note that no watchlist was used, and using it would make caching even slower):

[16:21:50] ======== Monitor: GBTree ========
[16:21:50] BoostNewTrees:	 6.436368s
[16:21:50] CommitModel:	 10.707612s

and after the fix:

[16:22:51] ======== Monitor: GBTree ========
[16:22:51] BoostNewTrees:	 6.497372s
[16:22:51] CommitModel:	 0.109006s

Some minor changes:

@codecov-io
Copy link

codecov-io commented Feb 12, 2018

Codecov Report

Merging #3109 into master will decrease coverage by <.01%.
The diff coverage is 10.52%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #3109      +/-   ##
============================================
- Coverage     43.79%   43.79%   -0.01%     
  Complexity      228      228              
============================================
  Files           159      159              
  Lines         12507    12507              
  Branches        466      466              
============================================
- Hits           5478     5477       -1     
- Misses         6837     6838       +1     
  Partials        192      192
Impacted Files Coverage Δ Complexity Δ
src/gbm/gbtree.cc 17.95% <0%> (-0.1%) 0 <0> (ø)
src/objective/regression_obj.cc 84% <100%> (ø) 0 <0> (ø) ⬇️
src/predictor/cpu_predictor.cc 68.71% <50%> (+0.19%) 0 <0> (ø) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 375d753...c414b00. Read the comment docs.

@khotilov khotilov merged commit 9ffe859 into dmlc:master Feb 16, 2018
@lock lock bot locked as resolved and limited conversation to collaborators Jan 18, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants