Support UUID with URL special characters. #5736

fxprunayre · 2021-06-09T15:12:42Z

eg. info:doi:10.24396/ORDAR-56 or http://dada.moo/ORDAR-56

In order to support UUID with character like / or ; in it, you need
to disable default Spring HTTP Firewall behavior which consider those characters unsecured.
Error would look like URL contained a potentially malicious String "%2F"

Client side URL encode UUIDs and spring will not
decode path before matching URL (which would cause issue with request mapping).

Use -Dgeonetwork.security.coreconfig=encodeduuid to enable the security configuration for the StrictHttpFirewall and the filterChainProxy (see config-security-core-encodeduuid.xml).

If encodeduuid is enabled, on Tomcat it will also require -Dorg.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH=true
and if using an Apache reverse proxy

  AllowEncodedSlashes On
  ProxyPass /geonetwork http://localhost:8080/geonetwork nocanon
  ProxyPassReverse /geonetwork http://localhost:8080/geonetwork

By default, this is not active and has to be enabled if needed.

This also fix UUID containing "." with part of the API operations not matching them.

On Elasticsearch side, document can also be accessed using URL encoded UUID eg. http://localhost:9200/gn-records/_doc/https%3A%2F%2Fdoi.org%2F10.13155%2F77514

Related to #3501

josegar74

Tested to import a metadata with that type of uuid, display the metadata detail page, edit it and add online resources, export to xml and pdf, all these looking fine.

But switching to the advanced view in the metadata detail page doesn't seem to work:

The record with identifier was not found or is not shared with you. Try to sign in if you've an account.

fxprunayre · 2021-06-11T05:50:00Z

But switching to the advanced view in the metadata detail page doesn't seem to work:

Fixed @josegar74

josegar74 · 2021-06-11T09:45:33Z

Doesn't really work for me, the option doesn't fail now, but the page is displaying the default view, not the full view.

In the metadata detail page, I see these 2 failing requests:

I noticed also that searching for info:doi:10.24396/ORDAR-56 shows a popup with this error message: Query returned an error. Check the console for details., the search request returns this error:

{
"servlet":"spring",
"message":"Error is: Bad Request.\nRequest:\n{&quot;from&quot;:0,&quot;size&quot;:30,&quot;sort&quot;:[&quot;_score&quot;],&quot;query&quot;:{&quot;function_score&quot;:{&quot;boost&quot;:&quot;5&quot;,&quot;functions&quot;:[{&quot;filter&quot;:{&quot;exists&quot;:{&quot;field&quot;:&quot;parentUuid&quot;}},&quot;weight&quot;:0.3},{&quot;filter&quot;:{&quot;match&quot;:{&quot;cl_status.key&quot;:&quot;obsolete&quot;}},&quot;weight&quot;:0.3},{&quot;gauss&quot;:{&quot;dateStamp&quot;:{&quot;scale&quot;:&quot;365d&quot;,&quot;offset&quot;:&quot;90d&quot;,&quot;decay&quot;:0.5}}}],&quot;score_mode&quot;:&quot;multiply&quot;,&quot;query&quot;:{&quot;bool&quot;:{&quot;must&quot;:[{&quot;query_string&quot;:{&quot;query&quot;:&quot;(any:(info\\\\:doi\\\\:10.24396/ORDAR\\\\-56) resourceTitleObject.default:(info\\\\:doi\\\\:10.24396/ORDAR\\\\-56)^2)&quot;}},{&quot;terms&quot;:{&quot;isTemplate&quot;:[&quot;n&quot;]}}],&quot;filter&quot;:{&quot;query_string&quot;:{&quot;query&quot;:&quot;* AND (draft:n OR draft:e)&quot;}}}}}},&quot;aggregations&quot;:{&quot;cl_hierarchyLevel.key&quot;:{&quot;terms&quot;:{&quot;field&quot;:&quot;cl_hierarchyLevel.key&quot;},&quot;aggs&quot;:{&quot;format&quot;:{&quot;terms&quot;:{&quot;field&quot;:&quot;format&quot;}}}},&quot;cl_spatialRepresentationType.key&quot;:{&quot;terms&quot;:{&quot;field&quot;:&quot;cl_spatialRepresentationType.key&quot;,&quot;size&quot;:10}},&quot;availableInServices&quot;:{&quot;filters&quot;:{&quot;filters&quot;:{&quot;availableInViewService&quot;:{&quot;query_string&quot;:{&quot;query&quot;:&quot;+linkProtocol:/OGC:WMS.*/&quot;}},&quot;availableInDownloadService&quot;:{&quot;query_string&quot;:{&quot;query&quot;:&quot;+linkProtocol:/OGC:WFS.*/&quot;}}}}},&quot;th_gemet_tree.default&quot;:{&quot;terms&quot;:{&quot;field&quot;:&quot;th_gemet_tree.default&quot;,&quot;size&quot;:100,&quot;order&quot;:{&quot;_key&quot;:&quot;asc&quot;},&quot;include&quot;:&quot;[^^]+^?[^^]+&quot;}},&quot;th_httpinspireeceuropaeumetadatacodelistPriorityDataset-PriorityDataset_tree.default&quot;:{&quot;terms&quot;:{&quot;field&quot;:&quot;th_httpinspireeceuropaeumetadatacodelistPriorityDataset-PriorityDataset_tree.default&quot;,&quot;size&quot;:100,&quot;order&quot;:{&quot;_key&quot;:&quot;asc&quot;}}},&quot;tag.default&quot;:{&quot;terms&quot;:{&quot;field&quot;:&quot;tag.default&quot;,&quot;include&quot;:&quot;.*&quot;,&quot;size&quot;:10},&quot;meta&quot;:{&quot;caseInsensitiveInclude&quot;:true}},&quot;th_regions_tree.default&quot;:{&quot;terms&quot;:{&quot;field&quot;:&quot;th_regions_tree.default&quot;,&quot;size&quot;:100,&quot;order&quot;:{&quot;_key&quot;:&quot;asc&quot;}}},&quot;resolutionScaleDenominator&quot;:{&quot;histogram&quot;:{&quot;field&quot;:&quot;resolutionScaleDenominator&quot;,&quot;interval&quot;:10000,&quot;keyed&quot;:true,&quot;min_doc_count&quot;:1},&quot;meta&quot;:{&quot;collapsed&quot;:true}},&quot;creationYearForResource&quot;:{&quot;histogram&quot;:{&quot;field&quot;:&quot;creationYearForResource&quot;,&quot;interval&quot;:5,&quot;keyed&quot;:true,&quot;min_doc_count&quot;:1},&quot;meta&quot;:{&quot;collapsed&quot;:true}},&quot;OrgForResource&quot;:{&quot;terms&quot;:{&quot;field&quot;:&quot;OrgForResource&quot;,&quot;include&quot;:&quot;.*&quot;,&quot;size&quot;:15},&quot;meta&quot;:{&quot;caseInsensitiveInclude&quot;:true}},&quot;cl_maintenanceAndUpdateFrequency.key&quot;:{&quot;terms&quot;:{&quot;field&quot;:&quot;cl_maintenanceAndUpdateFrequency.key&quot;,&quot;size&quot;:10},&quot;meta&quot;:{&quot;collapsed&quot;:true}}},&quot;_source&quot;:{&quot;includes&quot;:[&quot;uuid&quot;,&quot;id&quot;,&quot;creat*&quot;,&quot;group*&quot;,&quot;logo&quot;,&quot;category&quot;,&quot;topic*&quot;,&quot;inspire*&quot;,&quot;resource*&quot;,&quot;draft&quot;,&quot;overview.*&quot;,&quot;owner*&quot;,&quot;link*&quot;,&quot;image*&quot;,&quot;status*&quot;,&quot;rating&quot;,&quot;tag*&quot;,&quot;geom&quot;,&quot;contact*&quot;,&quot;*Org*&quot;,&quot;hasBoundingPolygon&quot;,&quot;isTemplate&quot;,&quot;valid&quot;,&quot;isHarvested&quot;,&quot;dateStamp&quot;,&quot;documentStandard&quot;,&quot;cl_status*&quot;,&quot;mdStatus*&quot;,&quot;recordLink&quot;,&quot;op*&quot;]},&quot;track_total_hits&quot;:true}\n.\nError:\n{&quot;error&quot;:{&quot;root_cause&quot;:[{&quot;type&quot;:&quot;query_shard_exception&quot;,&quot;reason&quot;:&quot;failed to create query: end-of-string expected at position 9&quot;,&quot;index_uuid&quot;:&quot;TGw-bPQSSZSh1uxDF-0Pow&quot;,&quot;index&quot;:&quot;gn-records&quot;}],&quot;type&quot;:&quot;search_phase_execution_exception&quot;,&quot;reason&quot;:&quot;all shards failed&quot;,&quot;phase&quot;:&quot;query&quot;,&quot;grouped&quot;:true,&quot;failed_shards&quot;:[{&quot;shard&quot;:0,&quot;index&quot;:&quot;gn-records&quot;,&quot;node&quot;:&quot;UrW-0pr0TQqbzuTymcFpXg&quot;,&quot;reason&quot;:{&quot;type&quot;:&quot;query_shard_exception&quot;,&quot;reason&quot;:&quot;failed to create query: end-of-string expected at position 9&quot;,&quot;index_uuid&quot;:&quot;TGw-bPQSSZSh1uxDF-0Pow&quot;,&quot;index&quot;:&quot;gn-records&quot;,&quot;caused_by&quot;:{&quot;type&quot;:&quot;illegal_argument_exception&quot;,&quot;reason&quot;:&quot;end-of-string expected at position 9&quot;}}}]},&quot;status&quot;:400}.",
"url":"/geonetwork/srv/api/search/records/_search",
"status":"400"
}

fxprunayre · 2021-06-11T10:49:46Z

Doesn't really work for me, the option doesn't fail now, but the page is displaying the default view, not the full view.

Did not pushed last changes :/ Fixed

I noticed also that searching for info:doi:10.24396/ORDAR-56 shows a popup with this error message: Query returned an error. Check the console for details., the search request returns this error:

You forgot to uncomment <property name="firewall" ref="httpFirewall"/> no ?

josegar74 · 2021-07-02T14:13:54Z

@fxprunayre, the full view works fine now, but the search doesn't. I have uncomment <property name="firewall" ref="httpFirewall"/>.

Code changes I have in config-security-core.xml to check:

fxprunayre · 2023-01-13T14:51:28Z

For supporting encoded / in UUID, on Tomcat it will also require
-Dorg.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH=true

and if using a reverse proxy

        AllowEncodedSlashes On
        ProxyPass /geonetwork http://localhost:8080/geonetwork nocanon
        ProxyPassReverse /geonetwork http://localhost:8080/geonetwork

eg. info:doi:10.24396/ORDAR-56 or http://dada.moo/ORDAR-56 In order to support UUID with character like / or ; in it, you need to disable default Spring HTTP Firewall behaviour which consider those characters unsecure. Error would look like "URL contained a potentially malicious String "%2F"" For this uncomment the firewall configuration in config-security-core.xml and adjust StrictHttpFirewall configuration. Also uncomment the firewall property of filterChainProxy. Client side already URL encode UUIDs and with this, spring will not decode path before matching URL (which would cause issue with request mapping) By default, this is not active.

… when using advanced view.

…issing titles and abtracts in index files.

…UID.

The simpleurl harvester can already point to JSON or XML feed. It can also point to a RDF DCAT feed which will be loaded using Jena. SPARQL queries are applied to extract necessary information from the RDF graph. This work was initially made by GIM team for Metadata vlaanderen in a DCAT-AP dedicated harvester (see https://github.com/metadata101/dcat-ap1.1/tree/master/src/main/java/org/fao/geonet/kernel/harvest/harvester/dcatap) but we considered that the simpleurl harvester can be a good candidate for simplification and provide DCAT feed support directly. Co-authored-by: Mathieu Chaussier <[email protected]> Co-authored-by: Gustaaf Van de Boel <[email protected]> Co-authored-by: Stijn Goedertier <[email protected]> The results can be converted using an XSL conversion. A conversion to ISO19115-3 is provided and custom plugins may provide other conversions. The provided ISO19115-3 conversion support only Dataset and cover most of the mapping done in OGC API record (see https://github.com/geonetwork/geonetwork-microservices/blob/main/modules/library/common-index-model/src/main/java/org/fao/geonet/index/converter/DcatConverter.java#L188) Tested with * http://mow-dataroom.s3-eu-west-1.amazonaws.com/dr_dcat.rdf * https://apps.titellus.net/geonetwork/api/collections/main/items?q=AlpenKonvention&f=dcat * https://apps.titellus.net/geonetwork/api/collections/main/items/7bb33d95-7950-499a-9bd8-6f31d58b0b35?f=dcat Other actions: - [ ] Add possibility to hash or not URI used for UUID (depends on #5736) - [ ] UI / Based on type of harvesting hide uneeded options eg. for a DCAT feed, only the URL is really necessary - [ ] Paging support for RDF feeds ? - [ ] Conversion / We could move them to schema to not to have to copy them in webapp/xsl/conversion folder. They would be grouped by schema which could also make the choice easier for end users

The simpleurl harvester can already point to JSON or XML feed. It can also point to a RDF DCAT feed which will be loaded using Jena. SPARQL queries are applied to extract necessary information from the RDF graph. This work was initially made by GIM team for Metadata vlaanderen in a DCAT-AP dedicated harvester (see https://github.com/metadata101/dcat-ap1.1/tree/master/src/main/java/org/fao/geonet/kernel/harvest/harvester/dcatap) but we considered that the simpleurl harvester can be a good candidate for simplification and provide DCAT feed support directly. The results can be converted using an XSL conversion. A conversion to ISO19115-3 is provided and custom plugins may provide other conversions. The provided ISO19115-3 conversion support only Dataset and cover most of the mapping done in OGC API record (see https://github.com/geonetwork/geonetwork-microservices/blob/main/modules/library/common-index-model/src/main/java/org/fao/geonet/index/converter/DcatConverter.java#L188) Tested with * http://mow-dataroom.s3-eu-west-1.amazonaws.com/dr_dcat.rdf * https://apps.titellus.net/geonetwork/api/collections/main/items?q=AlpenKonvention&f=dcat * https://apps.titellus.net/geonetwork/api/collections/main/items/7bb33d95-7950-499a-9bd8-6f31d58b0b35?f=dcat Other actions: - [ ] Add possibility to hash or not URI used for UUID (depends on #5736) - [ ] UI / Based on type of harvesting hide uneeded options eg. for a DCAT feed, only the URL is really necessary - [ ] Paging support for RDF feeds ? - [ ] Conversion / We could move them to schema to not to have to copy them in webapp/xsl/conversion folder. They would be grouped by schema which could also make the choice easier for end users Co-authored-by: Mathieu Chaussier <[email protected]> Co-authored-by: Gustaaf Van de Boel <[email protected]> Co-authored-by: Stijn Goedertier <[email protected]>

The simpleurl harvester can already point to JSON or XML feed. It can also point to a RDF DCAT feed which will be loaded using Jena. SPARQL queries are applied to extract necessary information from the RDF graph. This work was initially made by GIM team for Metadata vlaanderen in a DCAT-AP dedicated harvester (see https://github.com/metadata101/dcat-ap1.1/tree/master/src/main/java/org/fao/geonet/kernel/harvest/harvester/dcatap) but we considered that the simpleurl harvester can be a good candidate for simplification and provide DCAT feed support directly. The results can be converted using an XSL conversion. A conversion to ISO19115-3 is provided and custom plugins may provide other conversions. The provided ISO19115-3 conversion support only Dataset and cover most of the mapping done in OGC API record (see https://github.com/geonetwork/geonetwork-microservices/blob/main/modules/library/common-index-model/src/main/java/org/fao/geonet/index/converter/DcatConverter.java#L188) Tested with * http://mow-dataroom.s3-eu-west-1.amazonaws.com/dr_dcat.rdf * https://apps.titellus.net/geonetwork/api/collections/main/items?q=AlpenKonvention&f=dcat * https://apps.titellus.net/geonetwork/api/collections/main/items/7bb33d95-7950-499a-9bd8-6f31d58b0b35?f=dcat Other actions: - [ ] Add possibility to hash or not URI used for UUID (depends on geonetwork#5736) - [ ] UI / Based on type of harvesting hide uneeded options eg. for a DCAT feed, only the URL is really necessary - [ ] Paging support for RDF feeds ? - [ ] Conversion / We could move them to schema to not to have to copy them in webapp/xsl/conversion folder. They would be grouped by schema which could also make the choice easier for end users Co-authored-by: Mathieu Chaussier <[email protected]> Co-authored-by: Gustaaf Van de Boel <[email protected]> Co-authored-by: Stijn Goedertier <[email protected]>

fgravin · 2024-02-09T10:41:02Z

Excellent, thanks @josegar74 for pointing this out !

Would you please tell me what is the status of this PR ?

Is it used in production somewhere ?
Does it miss some devs ?
What energy would it need to be merged in main ?

Thanks for the work @fxprunayre !

josegar74 · 2024-02-09T13:44:58Z

@fgravin, I guess @fxprunayre can tell that better, but apart of resolving the conflicts, I think that needs more testing.

CLAassistant · 2024-12-08T03:49:55Z

All committers have signed the CLA.

fxprunayre added this to the 4.0.5 milestone Jun 9, 2021

fxprunayre requested a review from josegar74 June 9, 2021 15:12

josegar74 reviewed Jun 10, 2021

View reviewed changes

fxprunayre modified the milestones: 4.0.5, 4.0.6 Jun 18, 2021

josegar74 mentioned this pull request Sep 7, 2021

Json harvester #5942

Merged

jahow modified the milestones: 4.0.6, 4.0.7 Feb 2, 2022

fxprunayre marked this pull request as draft May 16, 2022 09:40

fxprunayre removed this from the 4.2.0 milestone May 17, 2022

fxprunayre added this to the 4.2.4 milestone Jan 13, 2023

fxprunayre added 4 commits January 17, 2023 09:39

Support UUID with URL special characters / Fix regex to retrieve UUID…

c5c949d

… when using advanced view.

Support UUID with URL special characters / Fix MEF folder names and m…

c68a45c

…issing titles and abtracts in index files.

Support UUID with URL special characters / Adjust URLs with encoded U…

0e9d2a3

…UID.

fxprunayre force-pushed the 405-uuid-url-encoded branch from a936d44 to 0e9d2a3 Compare January 17, 2023 13:25

Support UUID with URL special characters / Env variables to enable it.

db0b794

fxprunayre marked this pull request as ready for review January 17, 2023 14:08

fxprunayre mentioned this pull request Jan 20, 2023

Harvester / URL / Add RDF DCAT harvester. #6771

Merged

fxprunayre modified the milestones: 4.2.4, 4.2.3 Feb 7, 2023

fxprunayre modified the milestones: 4.2.3, 4.2.4 Feb 28, 2023

fxprunayre mentioned this pull request Mar 6, 2023

Harvester / DCAT / Improve RDF doc detection. #6831

Merged

fxprunayre modified the milestones: 4.2.4, 4.2.5 Apr 26, 2023

fxprunayre modified the milestones: 4.2.5, 4.2.6 Jul 5, 2023

fxprunayre modified the milestones: 4.2.6, 4.4.1 Oct 4, 2023

fxprunayre modified the milestones: 4.4.1, 4.4.2 Nov 22, 2023

fxprunayre modified the milestones: 4.4.2, 4.4.3 Jan 23, 2024

josegar74 mentioned this pull request Feb 9, 2024

Metadata identifier templates with slashes "/" #7733

Open

fgravin mentioned this pull request Feb 14, 2024

Support metadata identifiers with URL special characters #7757

Open

10 tasks

fxprunayre modified the milestones: 4.4.3, 4.4.4 Mar 13, 2024

fxprunayre modified the milestones: 4.4.4, 4.4.5 Apr 15, 2024

fxprunayre modified the milestones: 4.4.5, 4.4.6 Jun 4, 2024

fxprunayre modified the milestones: 4.4.6, 4.4.7 Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support UUID with URL special characters. #5736

Support UUID with URL special characters. #5736

fxprunayre commented Jun 9, 2021 •

edited

Loading

josegar74 left a comment •

edited

Loading

fxprunayre commented Jun 11, 2021

josegar74 commented Jun 11, 2021

fxprunayre commented Jun 11, 2021

josegar74 commented Jul 2, 2021

fxprunayre commented Jan 13, 2023 •

edited

Loading

fgravin commented Feb 9, 2024

josegar74 commented Feb 9, 2024

CLAassistant commented Dec 8, 2024 •

edited

Loading

Support UUID with URL special characters. #5736

Are you sure you want to change the base?

Support UUID with URL special characters. #5736

Conversation

fxprunayre commented Jun 9, 2021 • edited Loading

josegar74 left a comment • edited Loading

Choose a reason for hiding this comment

fxprunayre commented Jun 11, 2021

josegar74 commented Jun 11, 2021

fxprunayre commented Jun 11, 2021

josegar74 commented Jul 2, 2021

fxprunayre commented Jan 13, 2023 • edited Loading

fgravin commented Feb 9, 2024

josegar74 commented Feb 9, 2024

CLAassistant commented Dec 8, 2024 • edited Loading

fxprunayre commented Jun 9, 2021 •

edited

Loading

josegar74 left a comment •

edited

Loading

fxprunayre commented Jan 13, 2023 •

edited

Loading

CLAassistant commented Dec 8, 2024 •

edited

Loading