-
-
Notifications
You must be signed in to change notification settings - Fork 489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Json harvester #5942
Json harvester #5942
Conversation
A simple harvester which takes a URL expecting for now a JSON document and loop over document identified by a JSONPointer and applying an XSL to convert to ISO format. This should allow GeoNetwork to harvest some of the opendata portal providing all various search API providing JSON response usually.
|
||
nodes.forEach(record -> { | ||
Element xml = convertRecordToXml(record); | ||
uuids.put(record.get(params.recordIdPath).asText(), xml); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fxprunayre, @josegar74 if the identifier is an URI (like in ESRI DCAT eg "identifier": "https://data-atmo-hdf.opendata.arcgis.com/datasets/bac17d7d05a34242a8b22c535ecdb13d"
it set the URI as the uuid and it does not work when I open the metadata page.
I did a hack for my test:
uuids.put(record.get(params.recordIdPath).asText().split("/datasets/")[1], xml);
but I don't know how we could handle that properly, in any situation, would you have any suggestion ?
having a regexp in the harvester setting to extract the uuid but seems a bit tricky for the admins.
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fxprunayre did this PR #5736 that probably helps, but needs to enable some specific configuration.
If the identifiers have this format: http(s)://URL/UUID, maybe an option is when converting the JSON to ISO19139, set the gmd:fileIdentifier
to the UUID part of the identifier element in JSON and store the full identifier in gmd:identifier
element. That should not require any hack in the UI code, but not sure if it's really "correct" from the metadata content point of view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @josegar74, yes I think it's the way to go
I would just keep the uuid for the uuid and geep the uri for the resourceIdentifier.
But it means while harvesting, I have to know where is the uuid in the URI to extract it.
https://data-atmo-hdf.opendata.arcgis.com/datasets/bac17d7d05a34242a8b22c535ecdb13d will extract bac17d7d05a34242a8b22c535ecdb13d
4fc5872
to
332ff96
Compare
This commit is a squash of geonetwork/core-geonetwork#5942 A simple harvester which takes a URL expecting for now a JSON document and loop over document identified by a JSONPointer and applying an XSL to convert to ISO format. This should allow GeoNetwork to harvest some of the opendata portal providing all various search API providing JSON response usually. Harvester / Simple URL / Paging and basic opendatasoft support. Json harvester: fix merge conflicts jsonHarvester: handle JSONLD format with @ in tag names jsonHarvester: add ESRI JSONLD DCAT transformation hack: to remove, extract uuid from URIs
This commit is a squash of geonetwork/core-geonetwork#5942 A simple harvester which takes a URL expecting for now a JSON document and loop over document identified by a JSONPointer and applying an XSL to convert to ISO format. This should allow GeoNetwork to harvest some of the opendata portal providing all various search API providing JSON response usually. Harvester / Simple URL / Paging and basic opendatasoft support. Json harvester: fix merge conflicts jsonHarvester: handle JSONLD format with @ in tag names jsonHarvester: add ESRI JSONLD DCAT transformation hack: to remove, extract uuid from URIs jsonHarvester: extract uuid from identifier https://data-atmo-hdf.opendata.arcgis.com/datasets/bac17d7d05a34242a8b22c535ecdb13d will extract bac17d7d05a34242a8b22c535ecdb13d
used by ODS to compute exports links
@fxprunayre does it follow what you have initiated, do you approve ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really an expert on harvesters but I think this makes sense as a first iteration. Ideally this harvester should not be used directly as it is quite low level, and the user should be able to choose between ESRI DCAT, OpenDataSoft, CKAN etc.
Thanks @jahow |
This commit is a squash of geonetwork/core-geonetwork#5942 A simple harvester which takes a URL expecting for now a JSON document and loop over document identified by a JSONPointer and applying an XSL to convert to ISO format. This should allow GeoNetwork to harvest some of the opendata portal providing all various search API providing JSON response usually. Harvester / Simple URL / Paging and basic opendatasoft support. Json harvester: fix merge conflicts jsonHarvester: handle JSONLD format with @ in tag names jsonHarvester: add ESRI JSONLD DCAT transformation hack: to remove, extract uuid from URIs jsonHarvester: extract uuid from identifier https://data-atmo-hdf.opendata.arcgis.com/datasets/bac17d7d05a34242a8b22c535ecdb13d will extract bac17d7d05a34242a8b22c535ecdb13d
<xsl:strip-space elements="*"/> | ||
|
||
<xsl:template match="/record"> | ||
<xsl:variable name="cataloglang" select="'fr'"></xsl:variable> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure french language hard coded value here is representative of the variety of ESRI users. Not sure everyone using this want to create metadata record in french. You can check how other harvester are handling the case of a source not providing language information (eg. OGCWxS)
This commit is a squash of geonetwork/core-geonetwork#5942 A simple harvester which takes a URL expecting for now a JSON document and loop over document identified by a JSONPointer and applying an XSL to convert to ISO format. This should allow GeoNetwork to harvest some of the opendata portal providing all various search API providing JSON response usually. Harvester / Simple URL / Paging and basic opendatasoft support. Json harvester: fix merge conflicts jsonHarvester: handle JSONLD format with @ in tag names jsonHarvester: add ESRI JSONLD DCAT transformation hack: to remove, extract uuid from URIs jsonHarvester: extract uuid from identifier https://data-atmo-hdf.opendata.arcgis.com/datasets/bac17d7d05a34242a8b22c535ecdb13d will extract bac17d7d05a34242a8b22c535ecdb13d
Continue the work started by @fxprunayre in #4034
Aligned with last
main
branch.The goal is to be able to harvester Opendata catalog native API endpoints (CKAN, Opendatasoft, esri).
Loop on JSON
datasets
and map each object to a metadata record using a dedicated XSL transformation.Exemple of configuration
OPENDATASOFT
URL
https://metropole-europeenne-de-lille.opendatasoft.com/api/datasets/1.0/search
loopElement
/datasets
recordIdPath
datasetid
toISOConversion
OPENDATASOFT-to-DCAT2
ESRI
URL
https://data-atmo-hdf.opendata.arcgis.com/data.json
loopElement
/dataset
numberOfRecordPath
/result/count
recordIdPath
identifier
pageFromParam
start
pageSizeParam
rows
toISOConversion
ESRIDCAT-to-DCAT2