Skip to content

Commit

Permalink
fix output bug
Browse files Browse the repository at this point in the history
  • Loading branch information
pan-x-c committed Nov 18, 2024
1 parent 991e290 commit 58e357f
Showing 1 changed file with 2 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -302,8 +302,8 @@ def filter_with_union_find(table: pa.Table) -> pa.Table:
batch_format='pyarrow').groupby(
HashKeys.minhash).aggregate(
UnionFn(union_find)).materialize()
result = dataset_with_id.map_batches(filter_with_union_find,
batch_format='pyarrow')
result = dataset_with_id.map_batches(
filter_with_union_find, batch_format='pyarrow').materialize()
logger.info(f'Keep {result.count()} samples after MinHash dedup.')
union_find.clean()
return result
Expand Down

0 comments on commit 58e357f

Please sign in to comment.