Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File-Editor: Can't open files with multi-byte UTF-8 characters #671

Closed
sync-by-unito bot opened this issue Dec 27, 2021 · 6 comments
Closed

File-Editor: Can't open files with multi-byte UTF-8 characters #671

sync-by-unito bot opened this issue Dec 27, 2021 · 6 comments

Comments

@sync-by-unito
Copy link

sync-by-unito bot commented Dec 27, 2021

I am unable to open files whether from the FileExplorer or from this internal file explorer that have funky characters in the file name:

image

┆Issue is synchronized with this Asana task by Unito

@sync-by-unito
Copy link
Author

sync-by-unito bot commented Dec 27, 2021

➤ Jeremy Nicklas commented:

The viewer in FileExplorer works though. So this only seems to affect opening files in FileEditor.

@sync-by-unito
Copy link
Author

sync-by-unito bot commented Dec 27, 2021

➤ Brian L. McMichael commented:

This one is turning out to be a little sticky.

Using the example: (ノ°Д°)ノ︵ ┻━┻ is actually causing problems on the Rails end.

On the system, ) is actually a three-byte char represented by %EF%BC%89, but the ruby renderer is writing it out as a standard close paren ), which doesn't get escaped.

Likewise ︵ is represented by %EF%B8%B5 and it's being rendered as (

These are valid characters on the filesystem and I can open the files in the File Explorer viewer, but because rails (or maybe the browser) is making some odd substitutions as or before it gets rendered out, it's not encoding properly prior to getting passed to the API.

I'll need to pinpoint exactly where the substitution is happening and enforce the proper encoding.

@sync-by-unito
Copy link
Author

sync-by-unito bot commented Dec 27, 2021

➤ Brian L. McMichael commented:

The problem seems to be coming from the OodAppkit files api generator. There may be a setting required for Addressible gem to handle this appropriately.

irb(main):002:0> p = Pathname.new '(ノ°Д°)ノ︵ ┻━┻'
=> #<Pathname:(ノ°Д°)ノ︵ ┻━┻>
irb(main):003:0> o = OodAppkit.files.api(path: p).to_s
=> "/pun/sys/files/api/v1/fs(%E3%83%8E%C2%B0%D0%94%C2%B0)%E3%83%8E( %E2%94%BB%E2%94%81%E2%94%BB"
irb(main):005:0> p.to_s
=> "(ノ°Д°)ノ︵ ┻━┻"

@sync-by-unito
Copy link
Author

sync-by-unito bot commented Dec 27, 2021

➤ Brian L. McMichael commented:

Appears to affect Addressible 2.5.1

Addressable::URI.parse('http://www.google.com/(╯°□°)╯︵ ( http://www.google.com/(╯°□°)╯︵ ) ┻━┻').normalize
=> #<Addressable::URI:0x23b0478 URI:http://www.google.com/(%E2%95%AF%C2%B0%E2%96%A1%C2%B0)%E2%95%AF( %E2%94%BB%E2%94%81%E2%94%BB ( http://www.google.com/(%E2%95%AF%C2%B0%E2%96%A1%C2%B0)%E2%95%AF( %E2%94%BB%E2%94%81%E2%94%BB )>

@sync-by-unito
Copy link
Author

sync-by-unito bot commented Dec 27, 2021

➤ Brian L. McMichael commented:

The problem with Addressable seems intentional.

It's doing the right thing actually. IRIs (unicode-friendly URIs) use unicode normalization form KC to limit phishing. NFKC tends to do perceptual codepoint conversions, like converting '?' to '?'. The solution here is not to normalize the URI if this is causing a problem, or to instead normalize components piecemeal. "http://foo.com/blah%ef%bc%9f ( http://foo.com/blah%ef%bc%9f )" and "http://foo.com/blah%3F ( http://foo.com/blah%3F )" are considered equivalent.

sporkmonger/addressable#8 (comment) ( sporkmonger/addressable#8 (comment) )

nickjer is the .normalize call in OodAppkit necessary?

irb(main):003:0> p = Pathname.new "http://www.google.com/(╯°□°)╯︵ ( http://www.google.com/(╯°□°)╯︵ ) ┻━┻"
=> #<Pathname:http://www.google.com/(╯°□°)╯︵ ( http://www.google.com/(╯°□°)╯︵ ) ┻━┻>
irb(main):006:0> v = URI.encode p.to_s
=> "http://www.google.com/(%E2%95%AF%C2%B0%E2%96%A1%C2%B0%EF%BC%89%E2%95%AF%EF%B8%B5 %E2%94%BB%E2%94%81%E2%94%BB ( http://www.google.com/(%E2%95%AF%C2%B0%E2%96%A1%C2%B0%EF%BC%89%E2%95%AF%EF%B8%B5 %E2%94%BB%E2%94%81%E2%94%BB )"
irb(main):008:0> g = Addressable::URI.parse v
=> #<Addressable::URI:0x24c86f8 URI:http://www.google.com/(%E2%95%AF%C2%B0%E2%96%A1%C2%B0%EF%BC%89%E2%95%AF%EF%B8%B5 %E2%94%BB%E2%94%81%E2%94%BB ( http://www.google.com/(%E2%95%AF%C2%B0%E2%96%A1%C2%B0%EF%BC%89%E2%95%AF%EF%B8%B5 %E2%94%BB%E2%94%81%E2%94%BB )>
irb(main):009:0> g.normalize
=> #<Addressable::URI:0x24b92e8 URI:http://www.google.com/(%E2%95%AF%C2%B0%E2%96%A1%C2%B0)%E2%95%AF( %E2%94%BB%E2%94%81%E2%94%BB ( http://www.google.com/(%E2%95%AF%C2%B0%E2%96%A1%C2%B0)%E2%95%AF( %E2%94%BB%E2%94%81%E2%94%BB )>

@sync-by-unito
Copy link
Author

sync-by-unito bot commented Dec 27, 2021

➤ Jeremy Nicklas commented:

I say we punt on this and bring up a more url-friendly encoding for file paths that all of our apps implement for ingest.

An example being: https://ruby-doc.org/stdlib-2.2.0/libdoc/base64/rdoc/Base64.html

in particular the urlsafe option.

@ghost ghost closed this as completed Dec 30, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

0 participants