Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eLabFTW file source for Galaxy #18665

Open
kysrpex opened this issue Aug 8, 2024 · 2 comments · May be fixed by #19319
Open

eLabFTW file source for Galaxy #18665

kysrpex opened this issue Aug 8, 2024 · 2 comments · May be fixed by #19319
Assignees

Comments

@kysrpex
Copy link
Contributor

kysrpex commented Aug 8, 2024

eLabFTW file source for Galaxy

I am developing an integration of Galaxy with eLabFTW and found a couple of design mismatches between eLabFTW and Galaxy that are forcing me to take non-straightforward design decisions. If I am not careful, my decisions may clash with how Galaxy is intended to work, so I thought it makes sense to open an issue to seek consensus and/or other solutions.

Exporting and importing data to Galaxy

To take data out of Galaxy, there is the option to export a history, either as a direct download link or to a file source. Research data management repositories are included in the later group.

Exporting Galaxy histories

To import data to Galaxy, there is the upload option. Data from file sources can be accessed using the "Choose remote files" button.

Importing data to Galaxy

Remote files are represented and resolved in Galaxy using a path-like URI. File sources tipically define their own URI schema. For example invenio://zenodo_sandbox/92442/TestProduct.zip. Directory-like objects may be created in the file source using the endpoint /api/remote_files, which accepts JSON of the form {"target": "invenio://zenodo_sandbox/92442", "name": "Testing Publishing"}. File-like objects may be created using /api/histories/{history_id}/write_store, which accepts JSON that includes the target_uri key: {"target_uri": "invenio://zenodo_sandbox/92442/TestProduct.zip", ...}.

eLabFTW

eLabFTW revolves around the concepts of experiment and resource. Experiments and resources can contain file attachments. The scope of the integration would be exporting data from and importing data to eLabFTW as file attachments.

eLabFTW can be accessed thorugh a REST API, which is documented here. The sections experiments, items (internal name for resources) and uploads are of special relevance. Each entity (be it experiments or items) has an entity id (an integer), and the files attached to an entity, also known as "uploads", have an upload id (also an integer). Entity ids for experiments and items are independent (i.e. an experiment and an item can have the same id). Upload ids are common to experiments and items: an experiment and an item cannot have an attachment with the same id.

eLabFTW's backend assigns new identifiers incrementing the previous identifier of the same type, be it experiment identifiers, item identifiers, or upload identifiers. Experiment, item and upload names are not unique, e.g. two experiments can have the same name.

Integrating Galaxy with eLabFTW

Integrating eLabFTW with Galaxy through a file source involves finding a path-like URI representation for eLabFTW's experiments, items and uploads. A solution that quickly comes to mind are paths of the form /entity_type/entity_id/upload_id, where:

  • entity_type is either 'experiments' or 'resources'
  • entity_id is the id (an integer) of an experiment or resource (keep in mind those are independent)
  • upload_id is the id (an integer) of an attachment

Again, keep in mind that experiment, item and upload names are not unique. A solution based on names would not resolve them unambiguously. From the usability point of view, a solution based on ids may however be a problem, because although names and URIs seem to be decoupled when browsing file sources (see screenshot below),

Galaxy client requests made while browsing a file source

they are coupled when files are exported (see histories.export.ts, which gets fileName from user input).

The major issue is though, that /api/histories/{history_id}/write_store receives a target_uri as input, which means URIs must be known beforehand. But entity ids and upload ids cannot be predicted, because eLabFTW's backend generates them as users create experiments, resources and upload attachments. To make things worse, upload ids are global. This means Galaxy cannot try to guess the next id based on the largest id on the server

  1. because API requests would be scoped to a single user, which can only see entities it has been granted permission to see,
  2. because it does not scale; when two simultaneous uploads occur, their ids cannot be predicted.

Action points

I see thus two areas where taking action is needed:

  1. Fully decoupling path-like URIs from the names displayed to the user and the user's input.
  2. Letting Galaxy create new files on a file source without needing to know the last part of their URI beforehand (or alternatively, breaking some properties of paths, for example that saving a file on path x guarantees that it can be retrieved later using x, but I do not think that's a good approach).
@kysrpex
Copy link
Contributor Author

kysrpex commented Aug 8, 2024

This issue can be assigned to me. Pinging @bernt-matthias, since he was interested in discussing and testing the integration.

@davelopez
Copy link
Contributor

I need to study the case a bit, but as a first impression, this case clearly will need a new special UI entry here:
image

This UI will have to create the needed entities before the "export" similar to what the RDM file sources are doing. Then, once you have a proper URI that identifies the target entity (something like: elabftw://{elab_url}/entity_type/entity_id/upload_id) perform the upload in the backend. I don't know if that is possible, I haven't checked the eLabFTW API but that could be a potential solution.

kysrpex added a commit to kysrpex/galaxyproject-galaxy that referenced this issue Dec 12, 2024
eLabFTW [1] revolves around the concepts of experiment [2] and resource [3]. Experiments and resources can have files attached to them. To get a quick overview, try out the live demo [4]. The scope of this implementation is exporting data from and importing data to eLabFTW as file attachments of already existing experiments and resources. Each user can configure their preferred eLabFTW instance entering its URL and an API Key.

File sources reference files via a URI, while eLabFTW uses auto-incrementing positive integers. For more details read
galaxyproject#18665 [5]. This leads to the need to declare a mapping between said identifiers and Galaxy URIs.

Those take the form `elabftw://demo.elabftw.net/entity_type/entity_id/attachment_id`, where:
- `entity_type` is either 'experiments' or 'resources'
- entity_id` is the id (an integer in string form) of an experiment or resource
- `attachment_id` is the id (an integer in string form) of an attachment

This implementation uses both `aiohttp` and the `requests` libraries as underlying mechanisms to communicate with eLabFTW via its REST API [6]. A significant limitation of the implementation is that, due to the fact that the API does
not have an endpoint that can list attachments for several experiments and/or resources with a single request, when
listing the root directory or an entity type _recursively_, a list of entities has to be fetched first, then to fetch
the information on their attachments, a separate request has to be sent _for each one_ of them. The `aiohttp` library makes it
bearable to recursively browse instances with up to ~500 experiments or resources with attachments by sending them
concurrently, but ultimately solving the problem would require changes to the API from the eLabFTW side.

References:
- [1] https://www.elabftw.net/
- [2] https://doc.elabftw.net/user-guide.html#experiments
- [3] https://doc.elabftw.net/user-guide.html#resources
- [4] https://demo.elabftw.net
- [5] galaxyproject#18665
- [6] https://doc.elabftw.net/api/v2
kysrpex added a commit to kysrpex/galaxyproject-galaxy that referenced this issue Dec 12, 2024
eLabFTW [1] revolves around the concepts of experiment [2] and resource [3]. Experiments and resources can have files attached to them. To get a quick overview, try out the live demo [4]. The scope of this implementation is exporting data from and importing data to eLabFTW as file attachments of already existing experiments and resources. Each user can configure their preferred eLabFTW instance entering its URL and an API Key.

File sources reference files via a URI, while eLabFTW uses auto-incrementing positive integers. For more details read galaxyproject#18665 [5]. This leads to the need to declare a mapping between said identifiers and Galaxy URIs.

Those take the form `elabftw://demo.elabftw.net/entity_type/entity_id/attachment_id`, where:
- `entity_type` is either 'experiments' or 'resources'
- entity_id` is the id (an integer in string form) of an experiment or resource
- `attachment_id` is the id (an integer in string form) of an attachment

This implementation uses both `aiohttp` and the `requests` libraries as underlying mechanisms to communicate with eLabFTW via its REST API [6]. A significant limitation of the implementation is that, due to the fact that the API does not have an endpoint that can list attachments for several experiments and/or resources with a single request, when listing the root directory or an entity type _recursively_, a list of entities has to be fetched first, then to fetch the information on their attachments, a separate request has to be sent _for each one_ of them. The `aiohttp` library makes it bearable to recursively browse instances with up to ~500 experiments or resources with attachments by sending them concurrently, but ultimately solving the problem would require changes to the API from the eLabFTW side.

References:
- [1] https://www.elabftw.net/
- [2] https://doc.elabftw.net/user-guide.html#experiments
- [3] https://doc.elabftw.net/user-guide.html#resources
- [4] https://demo.elabftw.net
- [5] galaxyproject#18665
- [6] https://doc.elabftw.net/api/v2
@kysrpex kysrpex linked a pull request Dec 12, 2024 that will close this issue
4 tasks
kysrpex added a commit to kysrpex/galaxyproject-galaxy that referenced this issue Dec 12, 2024
eLabFTW [1] revolves around the concepts of experiment [2] and resource [3]. Experiments and resources can have files attached to them. To get a quick overview, try out the live demo [4]. The scope of this implementation is exporting data from and importing data to eLabFTW as file attachments of already existing experiments and resources. Each user can configure their preferred eLabFTW instance entering its URL and an API Key.

File sources reference files via a URI, while eLabFTW uses auto-incrementing positive integers. For more details read galaxyproject#18665 [5]. This leads to the need to declare a mapping between said identifiers and Galaxy URIs.

Those take the form `elabftw://demo.elabftw.net/entity_type/entity_id/attachment_id`, where:
- `entity_type` is either 'experiments' or 'resources'
- `entity_id` is the id (an integer in string form) of an experiment or resource
- `attachment_id` is the id (an integer in string form) of an attachment

This implementation uses both `aiohttp` and the `requests` libraries as underlying mechanisms to communicate with eLabFTW via its REST API [6]. A significant limitation of the implementation is that, due to the fact that the API does not have an endpoint that can list attachments for several experiments and/or resources with a single request, when listing the root directory or an entity type _recursively_, a list of entities has to be fetched first, then to fetch the information on their attachments, a separate request has to be sent _for each one_ of them. The `aiohttp` library makes it bearable to recursively browse instances with up to ~500 experiments or resources with attachments by sending them concurrently, but ultimately solving the problem would require changes to the API from the eLabFTW side.

References:
- [1] https://www.elabftw.net/
- [2] https://doc.elabftw.net/user-guide.html#experiments
- [3] https://doc.elabftw.net/user-guide.html#resources
- [4] https://demo.elabftw.net
- [5] galaxyproject#18665
- [6] https://doc.elabftw.net/api/v2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants