Support listing file sources asynchronously #19256
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The method
list()
fromgalaxy.files.sources.BaseFilesSource
lists the directories and files within a file source. An optional keyword argumentrecursive
(False
by default) lets it recursively retrieve directories and files within a specific directory.This operation is very cheap in terms of CPU and expensive in IO terms, be it network or filesystem IO. Depending on how the underlying system is built, it may support retrieving directories and files recursively or not. If it does not, then every time a directory is listed, it is necessary to make another request to list each subdirectory. This may end up involving hundreds of requests. Done sequentially, this can be extremely slow, especially if each one involves network access.
This PR makes the
list()
method asynchronous, which enables Galaxy to wait for the underlying system to complete the requests concurrently, resulting in a massive speedup. The price to pay is the extra complexity of using the async primitives.Since this change implies that all functions in the chain up to the API endpoints and the test functions must also be made asynchronous, this PR also takes care of it.
The changes from this PR are meant to address the friction that arises when integrating Galaxy with eLabFTW. Due to how the eLabFTW API is designed, listing the file source with
recursive=True
requires sending one request for each experiment or resource that contains at least one attachment. Given a large enough eLabFTW instance, this easily translates to hundreds of requests, and the whole process takes an eternity when done serially. Making so many requests concurrently is still not ideal, but at least listing ~500 experiments or resources with attachments becomes bearable.This is the second PR of a series of PRs that integrate eLabFTW with Galaxy via a file source (together they address issue #18665):
How to test the changes?
(Select all options that apply)
License