Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding/decoding binary stream breaks application #818

Open
Dreamsorcerer opened this issue Oct 16, 2021 · 4 comments · May be fixed by #915
Open

Encoding/decoding binary stream breaks application #818

Dreamsorcerer opened this issue Oct 16, 2021 · 4 comments · May be fixed by #915

Comments

@Dreamsorcerer
Copy link
Contributor

I'm trying to stream data with fabric to another server through an SSH connection. The data is just binary data and should have no meaning or anything applied to it within the application. But, when sending this via fabric, invoke will try to encode it as a str and then decode it again as bytes, which breaks everything and results in the socket closing.

My application runs fine after removing those steps from invoke:

--- a/invoke/runners.py
+++ b/invoke/runners.py
@@ -791,12 +791,6 @@ class Runner(object):
         bytes_ = None
         if ready_for_reading(input_):
             bytes_ = input_.read(bytes_to_read(input_))
-            # Decode if it appears to be binary-type. (From real terminal
-            # streams, usually yes; from file-like objects, often no.)
-            if bytes_ and isinstance(bytes_, six.binary_type):
-                # TODO: will decoding 1 byte at a time break multibyte
-                # character encodings? How to square interactivity with that?
-                bytes_ = self.decode(bytes_)
         return bytes_
 
     def handle_stdin(self, input_, output, echo):
@@ -976,7 +970,7 @@ class Runner(object):
         """
         # Encode always, then request implementing subclass to perform the
         # actual write to subprocess' stdin.
-        self._write_proc_stdin(data.encode(self.encoding))
+        self._write_proc_stdin(data)
 
     def decode(self, data):
         """
@Dreamsorcerer
Copy link
Contributor Author

Dreamsorcerer commented Jan 27, 2023

@bitprophet Do you have any ideas on how to solve this? This is a critical severity issue to me, as it is simply impossible to run the application successfully without patching this library.

I assume the above diff is not going to be accepted as a PR, but I really don't understand why it is encoding stuff. It is receiving arbitrary data it knows nothing about and then assuming it can decode it with the system's default locale, which just seems crazy to me. It's also set to 'replace' which means half the bytes are just removed from the original value, so it just produces garbage.

Maybe an option to avoid encoding/decoding if encoding is set to None (or a sentinel value if None is expected to use the default) would work?

@bitprophet
Copy link
Member

  • I assume this is still a problem on the recently released major revision that does away with the python 2 shims and six? if you haven't tested there, please do!
  • I think there might be other open tickets or PRs about this, probably worth searching around if you didn't already.
  • IIRC the intent here is that the most common use cases involve human-focused text (eg a human typing into a terminal stdin which may need processing/responding to, or creating a StringIO for auto-playback) and in some of those cases we're being handed bytes-that-are-unicode.
  • but I agree that we need to handle the pure binary transit case when it occurs - so there needs to be either a) a reliable method of detecting "bytes that happen to be valid unicode", such as a try/except, or b) a manual toggle for when a run() call is known to be expecting raw bytes / is not concerned with interactive passthrough / sudo password injection / etc.

@Dreamsorcerer
Copy link
Contributor Author

  • I assume this is still a problem on the recently released major revision that does away with the python 2 shims and six? if you haven't tested there, please do!

I can see the code is still there. But, can test it out later to double check.

* I think there might be other open tickets or PRs about this, probably worth searching around if you didn't already.

I didn't see any at the time, but another look suggests maybe #354 refers to the same problem.

* but I agree that we need to handle the pure binary transit case when it occurs - so there needs to be either a) a reliable method of detecting "bytes that happen to be valid unicode", such as a try/except, or b) a manual toggle for when a run() call is known to be expecting raw bytes / is not concerned with interactive passthrough / sudo password injection / etc.

I'm not sure how we'd do it automatically, a try/except could pass on one chunk of data and then fail on the next. I'm also not clear on what use cases are being solved by the encoding, but when using fabric it's clear to me that I'm just sending data through a pipe, so even if it was valid utf-8 there would be no point in decoding it just to encode it again. So, an explicit setting would be fine.
Does a None or sentinel value for the encoding parameter sound like a reasonable approach to toggle this behaviour?

@bitprophet
Copy link
Member

Does a None or sentinel value for the encoding parameter sound like a reasonable approach to toggle this behaviour?

Yes, though we'd need to check to make sure that self.encoding and/or the config pipeline that sets it, doesn't allow a None default (tho yea in that case, a sentinel object() is probably fine.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants