Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rest contracts for qaevents, qatopics and qasources endpoints #181

Merged
merged 10 commits into from
Dec 18, 2023

Conversation

LucaGiamminonni
Copy link
Contributor

The features included in this PR are the result of the OpenAIRE Call Innovation funded project "Enrich local data via the OpenAIRE Graph” awarded by 4Science (https://www.openaire.eu/open-call-winner-phase-1-4science). It provides a closer integration between DSpace and two OpenAIRE services, the Notification Broker and the OpenAIRE REST API.
Detailed documentation about the aims of the project, the implementation and the configuration options is available at https://4science.github.io/oaire-eld/#/
This PR regarding only the data correction section. The publication claim will be migrated with other PR.

Copy link
Contributor

@paulo-graca paulo-graca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @LucaGiamminonni for this contribution! I think my main difficulty with this PR is the lack of context. I think it will be useful, not only for me, but to everyone who find this for the first time to add more context. I've some context, I'm familiarized with what this should do from past discussions, but even so, with the lack of context makes me suggest dumm suggestions or ask dumm questions, for instance why it doesn't try to re-use an already existing mechanism at: /api/integration/externalsources I'm sure there is a reason, is there any discussion on this? Could it be referenced here? Is there any wiki page for it?

It's kind of personal preference, but I dislike the use of the prefix "nb" for the endpoints. I would prefer the use of "notification" instead. Becoming something like:

"https://dspace7.4science.cloud/server/api/integration/notificationevents/"

of course, if a collecion of notificationevents makes sense. But as mention, it's a personal preference not a blocking situation. What do you think?

nbevents.md Outdated
@@ -0,0 +1,116 @@
## Main Endpoint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's missing an introductory context for the Notification Broker integration. What are "nb" events?

nbevents.md Outdated
## GET Single suggestion
** GET /api/integration/nbevents/<:nbevent-id>

Return a single suggestion:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to have a more detailed explanation for what it returns. For instance what is a suggestion? What is the context?

nbsources.md Outdated
@@ -0,0 +1,50 @@
## Main Endpoint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's missing an introductory context for the Notification Broker sources. What "nb" stands for?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you point any references, even for the Notification's model structure?

Copy link
Member

@tdonohue tdonohue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LucaGiamminonni & @abollini : I agree with @paulo-graca that I'm finding this (and the implementation PRs) difficult to review despite having read through the docs at https://4science.github.io/oaire-eld/#/ . Based on those docs, I do understand that this is a very useful feature. But the terminology "notification broker" is very confusing to me (as well is the "nb" prefix used throughout the endpoints, codebase, etc).

These endpoints don't appear to be "notifications" in the same way as Linked Data Notifications (e.g. COAR Notify). Instead, they seem more like the existing "/externalsources" endpoints in that they pull data from an external source API (via a scheduled cron job). However, obviously, these endpoints are more enhanced in that they allow for ways to "enrich" existing objects (similar to ideas in DSpace/DSpace#2834 except that these endpoints enrich an item which is already "in archive"). It seems to overlap with the prior concept of "metadata suggestions" (#83) that came out of that 2834 ticket a while back.

It also appears (by skimming the code) that, while there are some generic interfaces here, it looks like the endpoints are mostly specific to OpenAIRE (at least at this time). Is that accurate? Should this be made even more specific to OpenAIRE by naming it something like /api/integrations/oaire-events (or similar)?

Overall, I like the general ideas here... I'm just struggling to make suggestions because this feature seems similar to ExternalResources and also similar to COAR Notify. Obviously, it'd be nice if we had a general plan for how they do or do not overlap to ensure we aren't duplicating similar concepts/code in different endpoints.

If this is easier to talk through in a meeting, we could find time this week or next to discuss this.

@tdonohue tdonohue changed the title [CST-5249] Rest contracts for nbevents, nbtopics and nvsources endpoints [CST-5249] Rest contracts for nbevents, nbtopics and nbsources endpoints Apr 5, 2022
@abollini
Copy link
Member

abollini commented Apr 7, 2022

hi, as anticipated in the community call we agree that these endpoints are not self-explaining.
Unfortunately, renaming them looks to be more expensive than what we hope but we are willing to do that if required and we can found a general agreement around better names.
To reduce the effort and risk of miss the 7.3 we suggest to proceed in the following way:

  • the angular and REST implementation should be fully reviewed and discussed ignoring the naming issue
  • once the code is ready to go, we will perform the final refactoring to align classes, database, endpoint to the agreed rest endpoint

I will try to explain a bit more the purpose of these endpoints and how they are not overlapping with existing concepts or other initiatives.
These endpoints are completely unrelated with COAR Notify project. The COAR Notify project is about a paradigm to provide interoperability between open system in the scholarly communication space based on Linked Data Notification.

The endpoint was named in this way as they were initially designed around the concepts of the OpenAIRE Notification Broker service but later on, as part of the EOSC supported project ELD Advance they have been generalized so that other provider than OpenAIRE could be plugged in future.

They implements the Data Correction service of the ELD project
https://4science.github.io/oaire-eld/#/data-correction

nbsources are the systems that will inform the repository somehow about improvements to existing data. Out-of-box an openAIRE provider is provided but in future you could have other such sources of suggestion as one base on UnPaywall / CORE that could suggest you to grab open access copies available elsewhere to complete your repository records

nbtopics are the "kind of correction" suggestions that you can get from a specific provider. OpenAIRE is able to suggest you a missing abstract, identifiers , untracked relations among publication and project and more. Other providers could have a completely different list of topics

nbevents are the exact correction suggestions that you have got from the provider. They were named events because this is how they are called in the OpenAIRE Notfication Broker (that is based on a queue/publish/subscribe architecture)

@tdonohue
Copy link
Member

tdonohue commented Apr 7, 2022

@abollini : Thanks for the additional context. However, that brings up a few more questions in my mind:

  • How are existing "externalsources" and "nbsources" different (or similar)? Should we consider a way to eventually combine these concepts, or are they entirely different (i.e. do they solve different use cases somehow)?
  • How are "metadata suggestions" (the concept, as we never have implemented it entirely) and "nbevents" different (or similar)? You mentioned "nbevents" might be more than metadata, but I still wonder if these two concepts should be somehow combined.... maybe all are "update events" (or "enrichments"), some of which could be metadata related and others involve content (or similar). Could "nbevents" be a new version of "metadata suggestions" or again are they meeting entirely different use cases?

My main concern here is around duplication of code / features / use cases. If these are all unique, then we need to better document / understand how they are different, and what use cases are met by each.

SIDENOTE: I noticed in your description you often use the term "correction". It makes me wonder if these are better considered "correctionSources" (nbsources), "correctionTopics" (nbtopics) and "correctionEvents" (nbevents). Not sure that's the proper terminology for each, but it's an early idea.

@abollini
Copy link
Member

abollini commented Apr 7, 2022

Thanks for your feedback @tdonohue and to encourage a broader discussion.
External sources could be related with some correction sources as in the case of openaire Graph that is used by their notification broker service to suggest correction to repositories about the records that they have contributed to the graph or can be queried via API as we actually do in the second service provided by the ELD project (the publication claim, a dedicated PR will come).
It is not hard to think about some scripts that would compare the data in an external source with the data in dspace and would report the differences in form of correction to apply to the dspace items. We actually have plan for that in future and we expect to be able to reuse some of the external source code (i.e. the external source classes will be probably collaborators of the OtherCorrectionSource).
Nevertheless there are also lot of potential correction sources that are not related to external sources such as the previous mentioned example from unpaywall or a ML algorithm that would process the items suggesting to add I keywords or relations using entity extraction or topic classification.
Last but not least these endpoints are intended to deal with archived items, external sources are instead involved in the records creation.

@LucaGiamminonni LucaGiamminonni changed the title [CST-5249] Rest contracts for nbevents, nbtopics and nbsources endpoints Rest contracts for qaevents, qatopics and qasources endpoints Jul 4, 2022
qasources.md Outdated
* 403 Forbidden - if you are not logged in with sufficient permissions, only system administrators can access

## GET Single Source
**GET /api/integration/qasources/<:qasource-id>**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @LucaGiamminonni,

One thing I recall from our meeting was that we would create a generic mechanism for quality assurance.
Does it makes sense to have something like we have in authority or external sources for dealing with multiple sources?
https://github.com/DSpace/RestContract/blob/main/external-authority-sources.md#single-entry

GET /api/integration/qasources/<:source-name>/entryValues/<:entry-id>

in this specific case, the notification broker (nb) source-name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @paulo-graca, currently it is possible to search all topics given a source and all events related to a topic using the endpoints /api/integration/qatopics/search/bySource and /api/integration/qaevents/search/findByTopic. This also reflects the pages that are shown on the angular side: first all the sources, then all the topics of the selected source (or directly the topics of the only source if there is only one) and then all the events of the selected topic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @LucaGiamminonni for your comment. Perhaps we are in different mindset regarding this. I look at this as new generic feature that we will be available in DSpace that handles sources, events, topics for quality assurance, being source agnostic. Apart from the implementation your team already have, I see it to be very similar to external sources and authority and how it should be specified. Perhaps external entities and authority should be differently and could be based also on search/bySource. But for the matter of the scope of this PR, for coherency and consistency, to me, this specification should be very close to what we already have regarding handling different sources.

Copy link
Contributor

@paulo-graca paulo-graca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things I would like to see in this PR:

@paulo-graca
Copy link
Contributor

@tdonohue in today's meeting also agreed with this change.

  • naming, I would prefer to have a qa- instead of a qa prefix, having:

    • /api/integration/qa-events
    • /api/integration/qa-sources
    • /api/integration/qa-topics

@LucaGiamminonni
Copy link
Contributor Author

Hi @paulo-graca, according to endpoint naming conventions, the - character should not be used https://github.com/DSpace/RestContract#on-the-naming-of-endpoints
Infact for example we have bitstreamformats, externalsources, itemrequests endpoints and not bistream-formats, external-sources and item-requests. So I think that the correct formats are qaevents, qasources and qatopics.

@tdonohue
Copy link
Member

tdonohue commented Aug 2, 2022

@LucaGiamminonni : Thanks for pointing that out... I had completely forgotten that policy we set a long time ago. In that case, I'm ok with keeping these as qaevents, qatopics, etc.

@tdonohue
Copy link
Member

tdonohue commented Sep 22, 2022

Per today's meeting, we decided that:

  1. The endpoints should be renamed qualityassuranceevents, qualityassurancesources, qualityassurancetopics
  2. I will create a PR to better document our official naming conventions, so that we can avoid having the same naming discussion in the future. This PR will enhance the current naming conventions here (but will not drastically change them) https://github.com/DSpace/RestContract/blob/main/README.md#on-the-naming-of-endpoints DONE, see Clarify/enhance endpoint naming rules #204

@LucaGiamminonni
Copy link
Contributor Author

@tdonohue @paulo-graca the PR is ready to be review. thanks

Copy link
Contributor

@paulo-graca paulo-graca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @LucaGiamminonni for your patience :) !

Copy link
Member

@tdonohue tdonohue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @LucaGiamminonni , @abollini and @frabacche ! This REST Contract looks correct overall. But, there are a few minor things to cleanup based on the latest code. All of these are very minor and are just correcting the contract to align with the current implementation.

Once these minor changes are made, I'm +1 this Contract

qualityassuranceevents.md Show resolved Hide resolved
qualityassuranceevents.md Outdated Show resolved Hide resolved
qualityassuranceevents.md Outdated Show resolved Hide resolved
qualityassurancetopics.md Outdated Show resolved Hide resolved
qualityassurancetopics.md Outdated Show resolved Hide resolved
Copy link
Member

@tdonohue tdonohue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Thanks @frabacche and @LucaGiamminonni ! This contract now looks good to me. I'll merge it once everything else is ready to go.

@tdonohue tdonohue added integration: OpenAIRE Related to integration with OpenAIRE integration: QA Notifications Related to QA (Quality Assurance) notifications from external sources labels Dec 18, 2023
@tdonohue tdonohue added this to the 8.0 milestone Dec 18, 2023
@tdonohue tdonohue merged commit 6a86b9e into DSpace:main Dec 18, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high priority integration: OpenAIRE Related to integration with OpenAIRE integration: QA Notifications Related to QA (Quality Assurance) notifications from external sources
Projects
No open projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

5 participants