Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support UUID with URL special characters. #5736

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

fxprunayre
Copy link
Member

@fxprunayre fxprunayre commented Jun 9, 2021

eg. info:doi:10.24396/ORDAR-56 or http://dada.moo/ORDAR-56

In order to support UUID with character like / or ; in it, you need
to disable default Spring HTTP Firewall behavior which consider those characters unsecured.
Error would look like URL contained a potentially malicious String "%2F"

Client side URL encode UUIDs and spring will not
decode path before matching URL (which would cause issue with request mapping).

Use -Dgeonetwork.security.coreconfig=encodeduuid to enable the security configuration for the StrictHttpFirewall and the filterChainProxy (see config-security-core-encodeduuid.xml).

If encodeduuid is enabled, on Tomcat it will also require -Dorg.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH=true
and if using an Apache reverse proxy

  AllowEncodedSlashes On
  ProxyPass /geonetwork http://localhost:8080/geonetwork nocanon
  ProxyPassReverse /geonetwork http://localhost:8080/geonetwork

By default, this is not active and has to be enabled if needed.

This also fix UUID containing "." with part of the API operations not matching them.

On Elasticsearch side, document can also be accessed using URL encoded UUID eg. http://localhost:9200/gn-records/_doc/https%3A%2F%2Fdoi.org%2F10.13155%2F77514

Related to #3501

@fxprunayre fxprunayre added this to the 4.0.5 milestone Jun 9, 2021
@fxprunayre fxprunayre requested a review from josegar74 June 9, 2021 15:12
Copy link
Member

@josegar74 josegar74 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested to import a metadata with that type of uuid, display the metadata detail page, edit it and add online resources, export to xml and pdf, all these looking fine.

But switching to the advanced view in the metadata detail page doesn't seem to work:

The record with identifier was not found or is not shared with you. Try to sign in if you've an account.

@fxprunayre
Copy link
Member Author

But switching to the advanced view in the metadata detail page doesn't seem to work:

Fixed @josegar74

@josegar74
Copy link
Member

Doesn't really work for me, the option doesn't fail now, but the page is displaying the default view, not the full view.

In the metadata detail page, I see these 2 failing requests:

I noticed also that searching for info:doi:10.24396/ORDAR-56 shows a popup with this error message: Query returned an error. Check the console for details., the search request returns this error:

{
"servlet":"spring",
"message":"Error is: Bad Request.\nRequest:\n{"from":0,"size":30,"sort":["_score"],"query":{"function_score":{"boost":"5","functions":[{"filter":{"exists":{"field":"parentUuid"}},"weight":0.3},{"filter":{"match":{"cl_status.key":"obsolete"}},"weight":0.3},{"gauss":{"dateStamp":{"scale":"365d","offset":"90d","decay":0.5}}}],"score_mode":"multiply","query":{"bool":{"must":[{"query_string":{"query":"(any:(info\\\\:doi\\\\:10.24396/ORDAR\\\\-56) resourceTitleObject.default:(info\\\\:doi\\\\:10.24396/ORDAR\\\\-56)^2)"}},{"terms":{"isTemplate":["n"]}}],"filter":{"query_string":{"query":"* AND (draft:n OR draft:e)"}}}}}},"aggregations":{"cl_hierarchyLevel.key":{"terms":{"field":"cl_hierarchyLevel.key"},"aggs":{"format":{"terms":{"field":"format"}}}},"cl_spatialRepresentationType.key":{"terms":{"field":"cl_spatialRepresentationType.key","size":10}},"availableInServices":{"filters":{"filters":{"availableInViewService":{"query_string":{"query":"+linkProtocol:/OGC:WMS.*/"}},"availableInDownloadService":{"query_string":{"query":"+linkProtocol:/OGC:WFS.*/"}}}}},"th_gemet_tree.default":{"terms":{"field":"th_gemet_tree.default","size":100,"order":{"_key":"asc"},"include":"[^^]+^?[^^]+"}},"th_httpinspireeceuropaeumetadatacodelistPriorityDataset-PriorityDataset_tree.default":{"terms":{"field":"th_httpinspireeceuropaeumetadatacodelistPriorityDataset-PriorityDataset_tree.default","size":100,"order":{"_key":"asc"}}},"tag.default":{"terms":{"field":"tag.default","include":".*","size":10},"meta":{"caseInsensitiveInclude":true}},"th_regions_tree.default":{"terms":{"field":"th_regions_tree.default","size":100,"order":{"_key":"asc"}}},"resolutionScaleDenominator":{"histogram":{"field":"resolutionScaleDenominator","interval":10000,"keyed":true,"min_doc_count":1},"meta":{"collapsed":true}},"creationYearForResource":{"histogram":{"field":"creationYearForResource","interval":5,"keyed":true,"min_doc_count":1},"meta":{"collapsed":true}},"OrgForResource":{"terms":{"field":"OrgForResource","include":".*","size":15},"meta":{"caseInsensitiveInclude":true}},"cl_maintenanceAndUpdateFrequency.key":{"terms":{"field":"cl_maintenanceAndUpdateFrequency.key","size":10},"meta":{"collapsed":true}}},"_source":{"includes":["uuid","id","creat*","group*","logo","category","topic*","inspire*","resource*","draft","overview.*","owner*","link*","image*","status*","rating","tag*","geom","contact*","*Org*","hasBoundingPolygon","isTemplate","valid","isHarvested","dateStamp","documentStandard","cl_status*","mdStatus*","recordLink","op*"]},"track_total_hits":true}\n.\nError:\n{"error":{"root_cause":[{"type":"query_shard_exception","reason":"failed to create query: end-of-string expected at position 9","index_uuid":"TGw-bPQSSZSh1uxDF-0Pow","index":"gn-records"}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"gn-records","node":"UrW-0pr0TQqbzuTymcFpXg","reason":{"type":"query_shard_exception","reason":"failed to create query: end-of-string expected at position 9","index_uuid":"TGw-bPQSSZSh1uxDF-0Pow","index":"gn-records","caused_by":{"type":"illegal_argument_exception","reason":"end-of-string expected at position 9"}}}]},"status":400}.",
"url":"/geonetwork/srv/api/search/records/_search",
"status":"400"
}

@fxprunayre
Copy link
Member Author

Doesn't really work for me, the option doesn't fail now, but the page is displaying the default view, not the full view.

Did not pushed last changes :/ Fixed

I noticed also that searching for info:doi:10.24396/ORDAR-56 shows a popup with this error message: Query returned an error. Check the console for details., the search request returns this error:

You forgot to uncomment <property name="firewall" ref="httpFirewall"/> no ?

@fxprunayre fxprunayre modified the milestones: 4.0.5, 4.0.6 Jun 18, 2021
@josegar74
Copy link
Member

@fxprunayre, the full view works fine now, but the search doesn't. I have uncomment <property name="firewall" ref="httpFirewall"/>.

Code changes I have in config-security-core.xml to check:

Screenshot 2021-07-02 at 16 13 04

@josegar74 josegar74 mentioned this pull request Sep 7, 2021
@jahow jahow modified the milestones: 4.0.6, 4.0.7 Feb 2, 2022
@fxprunayre fxprunayre marked this pull request as draft May 16, 2022 09:40
@fxprunayre fxprunayre removed this from the 4.2.0 milestone May 17, 2022
@fxprunayre fxprunayre added this to the 4.2.4 milestone Jan 13, 2023
@fxprunayre
Copy link
Member Author

fxprunayre commented Jan 13, 2023

For supporting encoded / in UUID, on Tomcat it will also require
-Dorg.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH=true

and if using a reverse proxy

        AllowEncodedSlashes On
        ProxyPass /geonetwork http://localhost:8080/geonetwork nocanon
        ProxyPassReverse /geonetwork http://localhost:8080/geonetwork

eg. info:doi:10.24396/ORDAR-56 or http://dada.moo/ORDAR-56

In order to support UUID with character like / or ; in it, you need
to disable default Spring HTTP Firewall behaviour which consider those characters unsecure.
Error would look like "URL contained a potentially malicious String "%2F""
For this uncomment the firewall configuration in config-security-core.xml and adjust StrictHttpFirewall configuration.
Also uncomment the firewall property of filterChainProxy.

Client side already URL encode UUIDs and with this, spring will not
decode path before matching URL (which would cause issue with request mapping)

By default, this is not active.
@fxprunayre fxprunayre force-pushed the 405-uuid-url-encoded branch from a936d44 to 0e9d2a3 Compare January 17, 2023 13:25
@fxprunayre fxprunayre marked this pull request as ready for review January 17, 2023 14:08
fxprunayre added a commit that referenced this pull request Jan 20, 2023
The simpleurl harvester can already point to JSON or XML feed. It can also
point to a RDF DCAT feed which will be loaded using Jena. SPARQL queries are applied
to extract necessary information from the RDF graph.

This work was initially made by GIM team for Metadata vlaanderen in a DCAT-AP dedicated harvester (see https://github.com/metadata101/dcat-ap1.1/tree/master/src/main/java/org/fao/geonet/kernel/harvest/harvester/dcatap) but we considered that
the simpleurl harvester can be a good candidate for simplification and provide DCAT feed support directly.

Co-authored-by: Mathieu Chaussier <[email protected]>
Co-authored-by: Gustaaf Van de Boel <[email protected]>
Co-authored-by: Stijn Goedertier <[email protected]>

The results can be converted using an XSL conversion. A conversion to ISO19115-3
is provided and custom plugins may provide other conversions. The provided ISO19115-3 conversion
support only Dataset and cover most of the mapping done in OGC API record (see https://github.com/geonetwork/geonetwork-microservices/blob/main/modules/library/common-index-model/src/main/java/org/fao/geonet/index/converter/DcatConverter.java#L188)

Tested with
* http://mow-dataroom.s3-eu-west-1.amazonaws.com/dr_dcat.rdf
* https://apps.titellus.net/geonetwork/api/collections/main/items?q=AlpenKonvention&f=dcat
* https://apps.titellus.net/geonetwork/api/collections/main/items/7bb33d95-7950-499a-9bd8-6f31d58b0b35?f=dcat

Other actions:
- [ ] Add possibility to hash or not URI used for UUID (depends on #5736)
- [ ] UI / Based on type of harvesting hide uneeded options eg. for a DCAT feed, only the URL is really necessary
- [ ] Paging support for RDF feeds ?
- [ ] Conversion / We could move them to schema to not to have to copy them in webapp/xsl/conversion folder. They would be grouped by schema which could also make the choice easier for end users
fxprunayre added a commit that referenced this pull request Jan 20, 2023
The simpleurl harvester can already point to JSON or XML feed. It can also
point to a RDF DCAT feed which will be loaded using Jena. SPARQL queries are applied
to extract necessary information from the RDF graph.

This work was initially made by GIM team for Metadata vlaanderen in a DCAT-AP dedicated harvester (see https://github.com/metadata101/dcat-ap1.1/tree/master/src/main/java/org/fao/geonet/kernel/harvest/harvester/dcatap) but we considered that
the simpleurl harvester can be a good candidate for simplification and provide DCAT feed support directly.

The results can be converted using an XSL conversion. A conversion to ISO19115-3
is provided and custom plugins may provide other conversions. The provided ISO19115-3 conversion
support only Dataset and cover most of the mapping done in OGC API record (see https://github.com/geonetwork/geonetwork-microservices/blob/main/modules/library/common-index-model/src/main/java/org/fao/geonet/index/converter/DcatConverter.java#L188)

Tested with
* http://mow-dataroom.s3-eu-west-1.amazonaws.com/dr_dcat.rdf
* https://apps.titellus.net/geonetwork/api/collections/main/items?q=AlpenKonvention&f=dcat
* https://apps.titellus.net/geonetwork/api/collections/main/items/7bb33d95-7950-499a-9bd8-6f31d58b0b35?f=dcat

Other actions:
- [ ] Add possibility to hash or not URI used for UUID (depends on #5736)
- [ ] UI / Based on type of harvesting hide uneeded options eg. for a DCAT feed, only the URL is really necessary
- [ ] Paging support for RDF feeds ?
- [ ] Conversion / We could move them to schema to not to have to copy them in webapp/xsl/conversion folder. They would be grouped by schema which could also make the choice easier for end users

Co-authored-by: Mathieu Chaussier <[email protected]>
Co-authored-by: Gustaaf Van de Boel <[email protected]>
Co-authored-by: Stijn Goedertier <[email protected]>
fxprunayre added a commit that referenced this pull request Jan 23, 2023
The simpleurl harvester can already point to JSON or XML feed. It can also
point to a RDF DCAT feed which will be loaded using Jena. SPARQL queries are applied
to extract necessary information from the RDF graph.

This work was initially made by GIM team for Metadata vlaanderen in a DCAT-AP dedicated harvester (see https://github.com/metadata101/dcat-ap1.1/tree/master/src/main/java/org/fao/geonet/kernel/harvest/harvester/dcatap) but we considered that
the simpleurl harvester can be a good candidate for simplification and provide DCAT feed support directly.

The results can be converted using an XSL conversion. A conversion to ISO19115-3
is provided and custom plugins may provide other conversions. The provided ISO19115-3 conversion
support only Dataset and cover most of the mapping done in OGC API record (see https://github.com/geonetwork/geonetwork-microservices/blob/main/modules/library/common-index-model/src/main/java/org/fao/geonet/index/converter/DcatConverter.java#L188)

Tested with
* http://mow-dataroom.s3-eu-west-1.amazonaws.com/dr_dcat.rdf
* https://apps.titellus.net/geonetwork/api/collections/main/items?q=AlpenKonvention&f=dcat
* https://apps.titellus.net/geonetwork/api/collections/main/items/7bb33d95-7950-499a-9bd8-6f31d58b0b35?f=dcat

Other actions:
- [ ] Add possibility to hash or not URI used for UUID (depends on #5736)
- [ ] UI / Based on type of harvesting hide uneeded options eg. for a DCAT feed, only the URL is really necessary
- [ ] Paging support for RDF feeds ?
- [ ] Conversion / We could move them to schema to not to have to copy them in webapp/xsl/conversion folder. They would be grouped by schema which could also make the choice easier for end users

Co-authored-by: Mathieu Chaussier <[email protected]>
Co-authored-by: Gustaaf Van de Boel <[email protected]>
Co-authored-by: Stijn Goedertier <[email protected]>
fxprunayre added a commit that referenced this pull request Feb 6, 2023
The simpleurl harvester can already point to JSON or XML feed. It can also
point to a RDF DCAT feed which will be loaded using Jena. SPARQL queries are applied
to extract necessary information from the RDF graph.

This work was initially made by GIM team for Metadata vlaanderen in a DCAT-AP dedicated harvester (see https://github.com/metadata101/dcat-ap1.1/tree/master/src/main/java/org/fao/geonet/kernel/harvest/harvester/dcatap) but we considered that
the simpleurl harvester can be a good candidate for simplification and provide DCAT feed support directly.

The results can be converted using an XSL conversion. A conversion to ISO19115-3
is provided and custom plugins may provide other conversions. The provided ISO19115-3 conversion
support only Dataset and cover most of the mapping done in OGC API record (see https://github.com/geonetwork/geonetwork-microservices/blob/main/modules/library/common-index-model/src/main/java/org/fao/geonet/index/converter/DcatConverter.java#L188)

Tested with
* http://mow-dataroom.s3-eu-west-1.amazonaws.com/dr_dcat.rdf
* https://apps.titellus.net/geonetwork/api/collections/main/items?q=AlpenKonvention&f=dcat
* https://apps.titellus.net/geonetwork/api/collections/main/items/7bb33d95-7950-499a-9bd8-6f31d58b0b35?f=dcat

Other actions:
- [ ] Add possibility to hash or not URI used for UUID (depends on #5736)
- [ ] UI / Based on type of harvesting hide uneeded options eg. for a DCAT feed, only the URL is really necessary
- [ ] Paging support for RDF feeds ?
- [ ] Conversion / We could move them to schema to not to have to copy them in webapp/xsl/conversion folder. They would be grouped by schema which could also make the choice easier for end users

Co-authored-by: Mathieu Chaussier <[email protected]>
Co-authored-by: Gustaaf Van de Boel <[email protected]>
Co-authored-by: Stijn Goedertier <[email protected]>
@fxprunayre fxprunayre modified the milestones: 4.2.4, 4.2.3 Feb 7, 2023
juanluisrp pushed a commit to GeoCat/core-geonetwork that referenced this pull request Feb 23, 2023
The simpleurl harvester can already point to JSON or XML feed. It can also
point to a RDF DCAT feed which will be loaded using Jena. SPARQL queries are applied
to extract necessary information from the RDF graph.

This work was initially made by GIM team for Metadata vlaanderen in a DCAT-AP dedicated harvester (see https://github.com/metadata101/dcat-ap1.1/tree/master/src/main/java/org/fao/geonet/kernel/harvest/harvester/dcatap) but we considered that
the simpleurl harvester can be a good candidate for simplification and provide DCAT feed support directly.

The results can be converted using an XSL conversion. A conversion to ISO19115-3
is provided and custom plugins may provide other conversions. The provided ISO19115-3 conversion
support only Dataset and cover most of the mapping done in OGC API record (see https://github.com/geonetwork/geonetwork-microservices/blob/main/modules/library/common-index-model/src/main/java/org/fao/geonet/index/converter/DcatConverter.java#L188)

Tested with
* http://mow-dataroom.s3-eu-west-1.amazonaws.com/dr_dcat.rdf
* https://apps.titellus.net/geonetwork/api/collections/main/items?q=AlpenKonvention&f=dcat
* https://apps.titellus.net/geonetwork/api/collections/main/items/7bb33d95-7950-499a-9bd8-6f31d58b0b35?f=dcat

Other actions:
- [ ] Add possibility to hash or not URI used for UUID (depends on geonetwork#5736)
- [ ] UI / Based on type of harvesting hide uneeded options eg. for a DCAT feed, only the URL is really necessary
- [ ] Paging support for RDF feeds ?
- [ ] Conversion / We could move them to schema to not to have to copy them in webapp/xsl/conversion folder. They would be grouped by schema which could also make the choice easier for end users

Co-authored-by: Mathieu Chaussier <[email protected]>
Co-authored-by: Gustaaf Van de Boel <[email protected]>
Co-authored-by: Stijn Goedertier <[email protected]>
@fxprunayre fxprunayre modified the milestones: 4.2.3, 4.2.4 Feb 28, 2023
@fxprunayre fxprunayre modified the milestones: 4.2.4, 4.2.5 Apr 26, 2023
@fxprunayre fxprunayre modified the milestones: 4.2.5, 4.2.6 Jul 5, 2023
@fxprunayre fxprunayre modified the milestones: 4.2.6, 4.4.1 Oct 4, 2023
@fxprunayre fxprunayre modified the milestones: 4.4.1, 4.4.2 Nov 22, 2023
@fxprunayre fxprunayre modified the milestones: 4.4.2, 4.4.3 Jan 23, 2024
@fgravin
Copy link
Member

fgravin commented Feb 9, 2024

Excellent, thanks @josegar74 for pointing this out !

Would you please tell me what is the status of this PR ?

  • Is it used in production somewhere ?
  • Does it miss some devs ?
  • What energy would it need to be merged in main ?

Thanks for the work @fxprunayre !

@josegar74
Copy link
Member

@fgravin, I guess @fxprunayre can tell that better, but apart of resolving the conflicts, I think that needs more testing.

@fxprunayre fxprunayre modified the milestones: 4.4.3, 4.4.4 Mar 13, 2024
@fxprunayre fxprunayre modified the milestones: 4.4.4, 4.4.5 Apr 15, 2024
@fxprunayre fxprunayre modified the milestones: 4.4.5, 4.4.6 Jun 4, 2024
@fxprunayre fxprunayre modified the milestones: 4.4.6, 4.4.7 Oct 15, 2024
@CLAassistant
Copy link

CLAassistant commented Dec 8, 2024

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants