Replies: 5 comments 5 replies
-
You asked several questions, I will add a comment per question so we can discuss each point in a thread if needed.
This setup is not clear to me. Please share your code.
As mentioned above, I can't understand your setup, but if the input is a set of files that can be processed independently, I see at least two options to maximize the throughput:
If that is not enough, you can go remote with parallel job instances, one job instance per file that you scale on different machines. |
Beta Was this translation helpful? Give feedback.
-
How did you come to this conclusion? |
Beta Was this translation helpful? Give feedback.
-
Threads and executors are expensive to create, so they should be limited. A rightly-sized thread pool can be shared between multiple steps |
Beta Was this translation helpful? Give feedback.
-
Yes! We are planning to redesign the concurrency model to leverage virtual threads. We are going to open a discussion on this topic. Stay tuned! |
Beta Was this translation helpful? Give feedback.
-
I believe (and hope!) I answered all your questions. Please let me know if you need further details on the matter. Thank you. |
Beta Was this translation helpful? Give feedback.
-
Hello all,
I have a couple questions about parallel steps that I have been unable to find googling. My current set up is I have written my own HTTP reader that uses the java 11 HttpClient to download files and then my FileSystem writer that uses fileChannels to insert chunks into the file in proper order. Both have been fully tested and work but what I am currently trying to do is maximize performance which lead me to the following questions.
When using ThreadPoolTaskExecutor with a Step why does each thread require a Hikari pool connection? To me this implies that each thread will be doing some form of transaction on the JobRepository. An explanation or a link to the code that would explain why this is necessary for the framework and if there is anyway for users of Spring-Batch to define such behavior.
Should steps share a ThreadPoolTaskExecutor or should each step have its own ThreadPoolTaskExecutor?
My current approach has been to run a large amount of steps concurrently(16+) where each step downloads 1 file and then each step has a thread pool to download a commit-interval amount of chunks of the file per-thread. I really love this mapping as its extremely simple to implement and my readers & writer are 100% lock free.
With Project Loom coming out in the next LTS will Spring Batch be able to fully utilize the new kind of threads being released and what kind of benefits can we hope to expect?
Any advice on how to maximize the gains from parallelism and concurrency would be great. I can share my performance results and source code if needed.
Beta Was this translation helpful? Give feedback.
All reactions