Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connect harvester to NDE Dataset Register #97

Open
ddeboer opened this issue Mar 14, 2022 · 7 comments
Open

Connect harvester to NDE Dataset Register #97

ddeboer opened this issue Mar 14, 2022 · 7 comments
Assignees
Labels
FAIR Datasets FAIR Datasets

Comments

@ddeboer
Copy link
Contributor

ddeboer commented Mar 14, 2022

The NDE Register will be used for (at the very least) B&G (#96) and KB.

Please find an example query here. Replace the

BIND (<http://data.bibliotheken.nl/id/thes/p075301482> as ?publisher)

with the publisher you want to retrieve datasets for. For a list of publishers, see this query.

Semantics of the query arguments are described at https://github.com/netwerk-digitaal-erfgoed/dataset-register#dcatdataset and based on the Requirements for Datasets.

You can also have a look at the NDE Dataset Register website for examples.

@ddeboer
Copy link
Contributor Author

ddeboer commented Mar 31, 2022

During today’s tech day, we discussed the idea of having a preparatory SPARQL query that returns a list of provider URIs to use in the regular query. On the side of the NDE Dataset Register we can add some predicate to datasets that should be included in the CLARIAH Registry. To keep things standardised, NDE then provides a SPARQL query that selects on that predicate to the CLARIAH Harvester.

This is similar to the <registry url=""> that the Harvester already supports for a URL that provides a list of OAI-PMH endpoints. Perhaps <registry query="SELECT ?uri WHERE { ?uri a dcat:Dataset ; <custom:predicate> includeInClariah . }">.

@ddeboer
Copy link
Contributor Author

ddeboer commented May 19, 2022

@menzowindhouwer As discussed, I’ve now changed the example query to a CONSTRUCT, allowing you to get its results as a single RDF graph per dataset rather than (duplicated) SELECT result bindings.

@ddeboer
Copy link
Contributor Author

ddeboer commented Nov 30, 2023

@menzowindhouwer @vicding-mi Can you elaborate on how you select datasets from the NDE Dataset Register for inclusion in the CLARIAH one? If I remember correctly, you do so on the level of the dataset’s publisher. If so, we want to add more publishers to that list, including https://uba.uva.nl, as requested by @LvanWissen.

@LvanWissen
Copy link
Member

LvanWissen commented Nov 30, 2023

On the other hand, I also see that not all datasets published by https://uba.uva.nl/ are relevant for CLARIAH. Some of the datasets in there are created in research projects, such as ECARTICO, OnStage and Cinema Context, and are relevant. The main collection can, for instance, stay only in the NDE register.

A more advanced filter would not only look at publisher, possibly also at creator/contributor (and their ORCiD or ROR identifiers).

@ddeboer
Copy link
Contributor Author

ddeboer commented Nov 30, 2023

@LvanWissen In that case, please see if netwerk-digitaal-erfgoed/dataset-register#483 would solve your use case.

@LvanWissen
Copy link
Member

Yes, but that's an 'opt-in' on my side, as it requires an extra attribute to the dataset description. I'd rather see an 'opt-out'. In my opinion, filtering should be done on the harvesting party's side.

@vicding-mi
Copy link

vicding-mi commented Nov 30, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FAIR Datasets FAIR Datasets
Projects
Development

No branches or pull requests

4 participants