Skip to content

Commit

Permalink
increase max concurrency to 50 (#203)
Browse files Browse the repository at this point in the history
### Notes

Tested concurrency settings with a 2500 page PDF.  

Notes on various settings:

15 threads: 29 minutes
50 threads: 11 minutes
150 threads: 5 minutes


Memory usage:

15 threads:

Partition of a set of 530464 objects. Total size = 150187404 bytes.
(~150MB)

50 threads:
Partition of a set of 530606 objects. Total size = 150206646 bytes.
(~150MB)

150 threads:
Partition of a set of 530654 objects. Total size = 150221029 bytes.
(~150MB)

For now proposing we increase this to a max of 50. We can always keep
increasing later.
  • Loading branch information
jordan-homan authored Nov 1, 2024
1 parent 99c6385 commit e3f818d
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions src/unstructured_client/_hooks/custom/split_pdf_hook.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,8 @@

DEFAULT_STARTING_PAGE_NUMBER = 1
DEFAULT_ALLOW_FAILED = False
DEFAULT_CONCURRENCY_LEVEL = 8
MAX_CONCURRENCY_LEVEL = 15
DEFAULT_CONCURRENCY_LEVEL = 10
MAX_CONCURRENCY_LEVEL = 50
MIN_PAGES_PER_SPLIT = 2
MAX_PAGES_PER_SPLIT = 20

Expand Down

0 comments on commit e3f818d

Please sign in to comment.