gateway: run Unicode Normalisation Forms on path gateway inputs #457

Jorropo · 2024-01-11T10:38:09Z

See context here: ipfs/kubo#10286 (comment)
Relevant Unicode spec: https://unicode.org/reports/tr15/

hacdias · 2024-01-11T12:56:03Z

For reference: https://go.dev/blog/normalization

lidel · 2024-01-12T22:50:20Z

Thank you for raising this.
We operate under ecosystem constraints:

UnixFS specification (Publish UnixFS specifications at specs.ipfs.tech #331) never normalised filenames (opaque strings)
We can't blindly run normalisation before resolving content path
- It would break access to data that has filenames in non-normalized notation.
We also can't make an arbitrary decision to change the filenames while onboarding data.
- There may be datasets which interlink and use different notation, and forcing normalization during onboarding to IPFS would break links in applications that operate on the data.

What is the problem we are trying to solve?
My understanding of linked issue is user copying "non-normalised" content path from somewhere, and getting "not found" error because DAG uses noralised filenames (notation mismatch).

If so, I think the best we could do UX-wise, is to retry on "not found" and trying normalised (NFC) / decomposed (NFD) forms (to cover both variants).

This way we don't break datasets where file already exists, but still fix HTTP 404 for cases where only file in different notation exists.

If this is something we want to do, should be included in #453 to ensure consistency across web contexts (which we will then reference from https://specs.ipfs.tech/http-gateways/path-gateway/).

But this introduces a magical behavior which hides the underlying problem macOS introduced – see my comment in ipfs/kubo#10286 (comment).

Perhaps it is better to NOT fix reads, and instead give users ability to force specific normalization during data onboarding instead? (like ipfs add --normalize-names none|nfd|nfc suggested in ipfs/kubo#10286 (comment)).

Jorropo added the need/triage Needs initial labeling and prioritization label Jan 11, 2024

lidel mentioned this issue Jan 12, 2024

Gateway does not run Unicode Normalization Forms leading to seemingly identical paths not resolving when using different non normalized strings ipfs/kubo#10286

Open

3 tasks

hacdias added P3 Low: Not priority right now kind/discussion Topical discussion; usually not changes to codebase and removed need/triage Needs initial labeling and prioritization labels Jan 23, 2024

lidel mentioned this issue Feb 6, 2024

[wip] Web Pathing Specification: initial outline with TODOs #453

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gateway: run Unicode Normalisation Forms on path gateway inputs #457

gateway: run Unicode Normalisation Forms on path gateway inputs #457

Jorropo commented Jan 11, 2024

hacdias commented Jan 11, 2024

lidel commented Jan 12, 2024 •

edited

Loading

gateway: run Unicode Normalisation Forms on path gateway inputs #457

gateway: run Unicode Normalisation Forms on path gateway inputs #457

Comments

Jorropo commented Jan 11, 2024

hacdias commented Jan 11, 2024

lidel commented Jan 12, 2024 • edited Loading

lidel commented Jan 12, 2024 •

edited

Loading