Skip to content

Commit

Permalink
Make dataset dump skipping embedding column
Browse files Browse the repository at this point in the history
  • Loading branch information
binkjakub committed Jun 1, 2024
1 parent 18b2c6a commit ea33387
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion scripts/dataset/dump_pl_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,9 @@ def main(
for offset in trange(start_offset, num_docs, chunk_size, desc="Chunks"):
docs = list(
tqdm(
collection.find(query, batch_size=batch_size).skip(offset).limit(chunk_size),
collection.find(query, {"embedding": 0}, batch_size=batch_size)
.skip(offset)
.limit(chunk_size),
total=chunk_size,
leave=False,
desc="Documents in chunk",
Expand Down

0 comments on commit ea33387

Please sign in to comment.