[FEA] Changing COO
Index_Type
in UMAP to prevent overflow when running with large datasets
#6010
Labels
COO
Index_Type
in UMAP to prevent overflow when running with large datasets
#6010
Description
UMAP cannot run large datasets right now because of an overflow issue.
raft::sparse::COO
defaults to usingint
for itsIndex_Type
and this becomes a problem.When this issue is solved, we need to update
UMAPAlgo::FuzzySimplSet::ML::run()
to takeCOO
with anIndex_Type
other thanint
.Details
Specifically,
coo_symmetrize
(raft function called fromUMAPAlgo::FuzzySimplSet::ML::run()
) allocatesnnz * 2
space on device. For a large dataset (e.g. 88M samples with knn graph degree 16) this value is larger than max int (88M * 16 * 2 > INT_MAX).The text was updated successfully, but these errors were encountered: