Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How should datasets be denoted as high-value datasets in their metadata? #3

Open
hogredan opened this issue Mar 27, 2024 · 7 comments

Comments

@hogredan
Copy link

According to the HVD Regulation, public sector bodies holding high-value datasets listed in the Annex shall ensure that the datasets are denoted as high-value datasets in their metadata description (Art. 3, 5).

In Germany, it has been decided that for all datasets in the national spatial data infrastructure that fall under the HVD Regulation, the category must be indicated in the ISO metadata as a keyword in combination with a source reference. This is to enable the central process of transforming ISO metadata into DCAT-AP metadata (permanent delivery towards the national Open Data Portal) and fulfil the requirements of DCAT-AP-HVD.

There are currently two options to declare the category in the metadata, either in free text (gco:CharacterString) or as a reference (gmx:Anchor). Both options can be processed by the above-mentioned transformation.

Example of category declaration in free text (gco:CharacterString)

<gmd:descriptiveKeywords>
<gmd:MD_Keywords>
<gmd:keyword>
<gco:CharacterString>Georaum</gco:CharacterString>
</gmd:keyword>
...
<gmd:thesaurusName>
<gmd:CI_Citation>
<gmd:title>
<gco:CharacterString>High-value dataset categories</gco:CharacterString>
</gmd:title>
<gmd:date>
<gmd:CI_Date>
<gmd:date>
<gco:Date>2023-09-27</gco:Date>
</gmd:date>
<gmd:dateType>
<gmd:CI_DateTypeCode codeList="https://standards.iso.org/iso/19139/resources/gmxCodelists.xml#CI_DateTypeCode" codeListValue="publication"/>
</gmd:dateType>
</gmd:CI_Date>
</gmd:date>
...
</gmd:CI_Citation>
</gmd:thesaurusName>
</gmd:MD_Keywords>
</gmd:descriptiveKeywords>

Example of the category declaration as a reference (gmx:Anchor)

<gmd:descriptiveKeywords>
<gmd:MD_Keywords>
<gmd:keyword>
<gmx:Anchor xlink:href="http://data.europa.eu/bna/c_ac64a52d">Georaum</gmx:Anchor>
</gmd:keyword>
...
<gmd:thesaurusName>
<gmd:CI_Citation>
<gmd:title>
<gmx:Anchor xlink:href="http://data.europa.eu/bna/asd487ae75">High-value dataset categories</gmx:Anchor>
</gmd:title>
<gmd:date>
<gmd:CI_Date>
<gmd:date>
<gco:Date>2023-09-27</gco:Date>
</gmd:date>
<gmd:dateType>
<gmd:CI_DateTypeCode codeList="https://standards.iso.org/iso/19139/resources/gmxCodelists.xml#CI_DateTypeCode" codeListValue="publication"/>
</gmd:dateType>
</gmd:CI_Date>
</gmd:date>
...
</gmd:CI_Citation>
</gmd:thesaurusName>
</gmd:MD_Keywords>
</gmd:descriptiveKeywords>

How do the other Member States implement the denotation of high-value datasets in the ISO-metadata?

@laers
Copy link

laers commented May 13, 2024

I have made a similar proposal for tagging metadata for data sets that is in scope of HVD. But I have added a tag: A tag that indicates that the data set is in scope of HVD using anchor to the legislation. Besides this reference another tag specifying which HVD category the data set belongs to.

Tag saying that the data set is in scope of HVD (gmx:Anchor xlink:href="https://eur-lex.europa.eu/eli/reg_impl/2023/138/oj"):

gmd:descriptiveKeywords
gmd:MD_Keywords
gmd:keyword
<gmx:Anchor xlink:href="https://eur-lex.europa.eu/eli/dir/2007/2/2019-06-26">INSPIRE</gmx:Anchor>
</gmd:keyword>
gmd:keyword
<gmx:Anchor xlink:href="https://eur-lex.europa.eu/eli/reg_impl/2023/138/oj">Høj-værdi datasæt</gmx:Anchor>
</gmd:keyword>
...
gmd:thesaurusName
gmd:CI_Citation
gmd:title
<gmx:Anchor xlink:href="https://registry.geonetwork-opensource.org/theme/eu">EU legislation</gmx:Anchor>
</gmd:title>
...
</gmd:thesaurusName>
</gmd:MD_Keywords>
</gmd:descriptiveKeywords>

Another tag that specifies that the data set belongs to Earth observation and environment (da: Jordobservation og miljø) category (gmx:Anchor xlink:href="http://data.europa.eu/bna/c_dd313021");

gmd:descriptiveKeywords
gmd:MD_Keywords
gmd:keyword
<gmx:Anchor xlink:href="http://data.europa.eu/bna/c_dd313021">Jordobservation og miljø</gmx:Anchor>
</gmd:keyword>
...
gmd:thesaurusName
gmd:CI_Citation
gmd:title
<gmx:Anchor xlink:href="http://data.europa.eu/bna/asd487ae75">High-value dataset categories</gmx:Anchor>
</gmd:title>
...
</gmd:thesaurusName>
</gmd:MD_Keywords>
</gmd:descriptiveKeywords>

Br Lars

@hallinpihlatie
Copy link

We're also planning to use of Anchors to refer to the HVD-categories.

I like Lars approach for the legislation, but I'd prefer to use a European code list. Could you share yours as a start Lars?

@laers
Copy link

laers commented May 14, 2024

Sure. I generated it in GeoNetwork and you are hopefully able to import it.

<rdf:RDF>
	<rdf:Description rdf:about="https://eur-lex.europa.eu/eli/reg_impl/2023/138/oj">
		<skos:prefLabel xml:lang="en">High-value datasets</skos:prefLabel>
		<skos:scopeNote xml:lang="en">COMMISSION IMPLEMENTING REGULATION (EU) 2023/138 of 21 December 2022 laying down a list of specific high-value datasets and the arrangements for their publication and re-use</skos:scopeNote>
		<skos:prefLabel xml:lang="da">Høj-værdi datasæt</skos:prefLabel>
		<skos:scopeNote xml:lang="da">KOMMISSIONENS GENNEMFØRELSESFORORDNING (EU) 2023/138 af 21. december 2022 om en liste over særlige typer datasæt af høj værdi og ordningerne for deres offentliggørelse og videreanvendelse</skos:scopeNote>
		<skos:inScheme rdf:resource="https://registry.geonetwork-opensource.org/theme/eu"/>
	</rdf:Description>
	<rdf:Description rdf:about="https://eur-lex.europa.eu/eli/dir/2007/2/2019-06-26">
		<rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
		<skos:prefLabel xml:lang="en">INSPIRE</skos:prefLabel>
		<skos:scopeNote xml:lang="en">DIRECTIVE 2007/2/EC OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL of 14 March 2007 establishing an Infrastructure for Spatial Information in the European Community (INSPIRE)</skos:scopeNote>
		<skos:prefLabel xml:lang="da">INSPIRE</skos:prefLabel>
		<skos:scopeNote xml:lang="da">EUROPA-PARLAMENTETS OG RÅDETS DIREKTIV 2007/2/EF af 14. marts 2007 om opbygning af en infrastruktur for geografisk information i Det Europæiske Fællesskab (Inspire)</skos:scopeNote>
		<skos:inScheme rdf:resource="https://registry.geonetwork-opensource.org/theme/eu"/>
	</rdf:Description>
	<rdf:Description rdf:about="https://eur-lex.europa.eu/eli/reg_impl/2023/138/oj">
		<rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
	</rdf:Description>
	<rdf:Description rdf:about="https://registry.geonetwork-opensource.org/theme/eu">
		<rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#ConceptScheme"/>
		<dc:title xml:lang="en">EU legislation</dc:title>
		<dc:title xml:lang="da">EU legislation</dc:title>
		<dc:description xml:lang="en">Code list of EU Directives that a dataset is in scope of.</dc:description>
		<dc:description xml:lang="da">Kodeliste med EU direktiver som et dataprodukt er omfattet af.</dc:description>
		<dc:identifier>eu.rdf</dc:identifier>
		<dc:type>theme</dc:type>
	</rdf:Description>
</rdf:RDF>

@hallinpihlatie
Copy link

Many thanks. I added Finnish and Swedish translations and was able to import it to GN after adding a few lines, mainly namespaces. Here's the RDF-file as a ZIP-file for possible re-use.
Legislation_EU.zip

@hallinpihlatie
Copy link

Back to Germany's question. I'm in favour of the Anchor version and it works fine for metadata in a single language. In multilingual metadata I get a mix of Anchor and CharacterString as shown below. Any hints on how to get rid of this mix in GeoNetwork 3.12?

gmd:MD_Keywords
<gmd:keyword xsi:type="gmd:PT_FreeText_PropertyType">
<gmx:Anchor xlink:href="http://data.europa.eu/bna/c_ac64a52d">Paikkatiedot</gmx:Anchor>
gmd:PT_FreeText
gmd:textGroup
<gmd:LocalisedCharacterString locale="#FI">Paikkatiedot</gmd:LocalisedCharacterString>
</gmd:textGroup>
gmd:textGroup
<gmd:LocalisedCharacterString locale="#SV">Geospatiala data</gmd:LocalisedCharacterString>
</gmd:textGroup>
gmd:textGroup
<gmd:LocalisedCharacterString locale="#EN">Geospatial</gmd:LocalisedCharacterString>
</gmd:textGroup>
</gmd:PT_FreeText>
</gmd:keyword>
gmd:type
<gmd:MD_KeywordTypeCode codeList="http://standards.iso.org/iso/19139/resources/gmxCodelists.xml#MD_KeywordTypeCode" codeListValue="theme"/>
</gmd:type>
gmd:thesaurusName
gmd:CI_Citation
<gmd:title xsi:type="gmd:PT_FreeText_PropertyType">
<gmx:Anchor xlink:href="http://data.europa.eu/bna/asd487ae75">High-value dataset categories</gmx:Anchor>
gmd:PT_FreeText
gmd:textGroup
<gmd:LocalisedCharacterString locale="#FI">High-value dataset categories</gmd:LocalisedCharacterString>
</gmd:textGroup>
</gmd:PT_FreeText>
</gmd:title>
gmd:date
gmd:CI_Date
gmd:date
gco:Date2023-09-27</gco:Date>
</gmd:date>
gmd:dateType
<gmd:CI_DateTypeCode codeList="http://standards.iso.org/iso/19139/resources/gmxCodelists.xml#CI_DateTypeCode" codeListValue="publication"/>
</gmd:dateType>
</gmd:CI_Date>
</gmd:date>

@hallinpihlatie
Copy link

I just noticed that the High-value dataset categories code list has been updated with the sub-categories in English. What is the recommendation? Is it to mark your metadata with 1) only the sub-category, 2) only the main category 3) with both or 4) is there no recommendation?

@oberseri
Copy link

Is there any obligation to indicate the category of the HVD datasets - I cannot find it in the legal text.
We are thinking of labelling them as HVD only - preferably by referencing a label "HighValueDataset" in our national registry

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants