-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance with large numbers of FileAnnotations #517
Comments
Suggesting that for first pass mitigation @knabar and I look at possibly limiting the number of objects shown. This will require a little bit of refactoring due to the way the queries are performed and client side lexical sorting by |
This has also been reported at #499 via Image.sc |
Thanks, @will-moore. I've done a little bit of homework to hopefully help us make some good decisions here. Multiple parent support was added in ome/openmicroscopy#570 which was a rebase of ome/openmicroscopy#557. The initial implementation for "orphan" annotation lookup was added in ome/openmicroscopy@e4534d3 as part of the refactoring detailed in https://trac.openmicroscopy.org/ome/ticket/3650. The naming is unfortunate as what are looked up are not really orphans but rather annotations that are not linked to the parent object(s). Furthermore, there are various hidden features, such as how objects with an unset namespace are handled, that remain undocumented. It's clear from limited investigation that much of our code relies on these undocumented features. The call stack is loosely:
to
to
As briefly raised by @joshmoore on #499, if we want to handle large numbers of FileAnnotations in a better way, predictable counting and paging are baseline requirements. Achieving this with the current API would be a significant refactor across both In order to tackle performance and safety, there are three main problem areas:
I've been investigating queries to collapse down a bunch of these issues so that we can safely reproduce the sorting and multiple parent features while being able to add counting and paging. After a relational algebra 101 refresher I've come up with the following:
That's for Screen parents We should be able to make a version of the above generic to all the annotation and parent types as well as add counting, order (sorting), as well as offsets and limits (paging). The requirements for the UI are strictly For reference:
|
Compatible example query which performs sorting and extracts the required information:
Edit 1 (2023-12-07): I forgot to add the ownership filter from |
I'm just wondering what the UI will look like after these changes / fixes... If we're going to offer pagination, do we also need a query to get the total number of "orphaned" FileAnnotations, so we know how many pages? Or we could just offer a "Next" button until we get to the end (incomplete page). Again, that would be sub-optimal but OK for occasional use. If users are mainly using this to fix accidentally unlinked FileAnnotations, is there some improvement we could make to the "unlink" workflow to help improve the clarity? Current UI doesn't say much: Final thought - Is there any value is offering the option of choosing genuine orphaned FileAnnotations (not linked to any parent) as well as FileAnnotations not linked to the selected objects? |
My personal goals for initial work are solely to make this dialog safe to open. Assuming we can get a query to mimic the current behaviour, what I have above is about 95% there, the changes would then be:
If more UI rework is desired, it can come after we make this safe IMO.
Agreed wholeheartedly. I expect a great many users have never even considered the fact that you can crosslink file annotations and they completely ignore the bottom part of the current dialog.
Certainly an option.
Agreed. Nor does the current UI, with a display of filename only, give the user enough context to make that decision in a large number of cases. Search or filter would definitely be nicer. The current UI utilizes Django forms and the
That is certainly the case for our users.
It's been over 12 years since we created the dialog and it basically hasn't changed since. I'm quite confident the original intent did not consider programmatic association of hundreds of thousands of file annotations and shared groups of dozens of users (our use case). As for exactly what the "main use-case" is, at least to me it's for linking existing known file annotations with another object not re-linking annotations that were accidentally unlinked. But that's wild speculation, I'm not aware of anyone actually using this feature and if they are we have no real concrete way to assess how or why they are using it. The vast majority of our deployments have a small number of shared read-annotate or read-write groups. In such circumstances this dialog is nearly entirely useless as it becomes quickly polluted with the file annotations of everyone in the group.
Yes, but that's pretty straightforward to do if we have a version of the query above ready.
Of course. Unlink vs. delete is already an unnatural and complicated topic across the entirety of OMERO.
I'm sure there is but I expect that quickly takes us into a discussion not just about linking file annotations but rather file annotation management as a whole which we have essentially no graphical UI for in either OMERO.web or OMERO.insight.
I would say no, they're not but again that is speculation. I don't think any of us have the data to support such an assertion and given what I expect we would all agree on, almost no one uses this feature, that data would be pretty weak anyway. |
Agree with all those points and plan, with 1 exception: I think you could make the cutoff limit greater than 100, and possibly stick with sorting by name too. |
I'm honestly not that opinionated on the maximum we just need to decide on one. The query is going to perform significantly better than it does now because we will not be loading the entire I think showing the most recently added file annotations at the top of the list is far more natural but as I expect we've already established, the use of this feature has got to be low and it's already unusable for so many use cases keeping it lexical is also fine. It is not going to be difficult to change if we decide on something different later on. |
Refactors the stack which produces the list of existing FileAnnotations for selection in the "Choose attachments" dialog.
Refactors the stack which produces the list of existing FileAnnotations for selection in the "Choose attachments" dialog.
Refactors the stack which produces the list of existing FileAnnotations for selection in the "Choose attachments" dialog.
A first pass for discussion during tomorrow's web meeting: |
Thanks Chris. Do you want to open a (draft) PR for that branch, so we can discuss there? I just wrote a little script for generating lots of FileAnnotations if it's useful for testing anywhere. Created a bunch of them on merge-ci for when that PR is open, although I think draft PRs aren't merged by default. |
Sorry - script is at https://github.com/will-moore/python-scripts/blob/main/create_file_annotations.py |
insight uses directly the method from the MetadataService i.e. https://github.com/ome/omero-server/blob/master/src/main/java/ome/logic/MetadataImpl.java#L575 This is the call used for example when expanding the attachment tab |
Insight has a 2 steps process to add attachments
as mentioned earlier, insight does not use use the Java gateway (which has a loadAnnotations) method so that will need to be reviewed since we could have 2 entry points that could create trouble |
Understood. I don't think the server side |
That the method we use to load the annotations owned by a user (attachments tabs for example) |
Understood. We definitely have cases where one user (a service account) would have many, many FileAnnotations. |
@knabar / @will-moore: I expect with #519 having been merged and released in 5.24.0 we can close this issue? |
Also, we should post on https://forum.image.sc/t/unable-to-add-attachment-using-omero-web/84146/18 as well, no? |
Thanks Chris. Posted on image.sc and closing this... |
On a busy long running system, particularly one where image data is being run through one or more analytical pipelines, the number of
FileAnnotations
in a given group can easily number in the 1000s. We have exposure to some systems where it numbers in the 10s of thousands. The current "File Attachment" dialog assumes that it can provide a list of everyFileAnnotation
not currently linked to the selected object(s) for linking; performing quite poorly at 10^3, terribly at 10^4 and causes server errors at 10^5.Not sure if this is also be a problem for OMERO.insight, @jburel + @dominikl.
This SQL query gives a
FileAnnotation
vs. group summary to get a sense of the scale on a particular install:/cc @emilroz, @muhanadz, @erindiel
The text was updated successfully, but these errors were encountered: