-
-
Notifications
You must be signed in to change notification settings - Fork 747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correctly handle unicode characters in the URL path names and the CLI arguments (sys.argv) #5189
Conversation
characters are handled correctly.
This reverts commit 4e0bb04.
where the argument contains unicode character.
easier testability, etc. Add unit tests for that function.
Keep in mind that API tests are problematic - we don't exercise exactly the same code paths as we do when running outside tests because the tests itself utilize webtest module which has the same bug as webob. This means I needed to patch the code so it utilizes the same version of the functions as we do in prod. Not ideal and also shows why it's importat we also have actual end to end tests for the api.
Alright, I added various unit tests - e08f9f1, 2f06857. Adding unit tests for API turned out to be quite challenging and painful. The root cause is that unit tests for the API layer rely on To get the tests to work, I needed to patch the test code so we use the same version we use for non-test code. This is of course not ideal and also shows that it's important we have actual end to end tests for API because those webtest based one have limitations and don't 100% mimic actual non-test code paths. I will also work on actual end to end test. In the future we should probably also look at upgrading |
Test fails on Ubuntu 16.04 because locale is not set to So going forward we need to decide if we should either update locale on Ubuntu 16.04 end to end test system to use UTF-8 or drop it all together (since as discussed in the past, it's EOL). And to clarify, #5182 likely wont help or make a difference here - this behavior is expected and desired - if locale is not set to utf-8, trying to save data with unicode characters will fail and thrown an exception - as expected and designed (#5182 will only help with the underlying bug which resulted in infinite loop instead of the exception being propagated and logged). |
the request URL contains invalid or incorrectly URL encoded characters. Previously we didn't handle this scenario which means original UnicodeDecodeError error with the stack trace got propagated to the user which is a no go. Related issue - GoogleCloudPlatform/webapp2#152.
While working and testing this fix, I found another edge case we don't correctly handle - 04e58b6. If the request URL / path contains invalid characters (that's of course still possible even with this fix - e.g. invalid escape sequences, etc.), we didn't catch this error and the stack track would be propagated and returned directly to the end user: (virtualenv) vagrant@ubuntu-xenial:~/st2$ curl -X GET -H 'User-Agent: python-requests/2.23.0' -H 'Accept-Encoding: gzip, deflate' -H 'Accept: */*' -H 'Connection: keep-alive' "http://127.0.0.1:9101/v1/rules/%D0%C2%BD%A8%CE%C4%BC%FE%BC%D0.rar"
Traceback (most recent call last):
File "/home/vagrant/st2/st2common/st2common/router.py", line 246, in match
path = urllib.parse.unquote(req.path)
File "/home/vagrant/st2/virtualenv/lib/python3.6/site-packages/webob/request.py", line 476, in path
bpath = bytes_(self.path_info, self.url_encoding)
File "/home/vagrant/st2/virtualenv/lib/python3.6/site-packages/webob/descriptors.py", line 70, in fget
return req.encget(key, encattr=encattr)
File "/home/vagrant/st2/virtualenv/lib/python3.6/site-packages/webob/request.py", line 165, in encget
return bytes_(val, 'latin-1').decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 10: invalid continuation byte That's of course is not a good thing. I fixed it to return user-friendly error and ensure we don't leak stack info to the end user. |
Regarding Ubuntu 16.04 - the current plan is to drop 16.04 support in ST2 3.5 IF we merge support for 20.04 by then. But by that point Ubuntu 16.04 will be EOL, and I don't think it makes any sense whatsoever to support our software on an OS that is not supported by its upstream. The CentOS story might be similar with the shenanigans happening there. So what exactly our OS support is going to look like is a bit up in the air at this point. This comment here is a little disorganized (sorry), but the main point is that we should definitely add Ubuntu 20.04 support so we can drop Ubuntu 16.04 support. |
tests on Ubuntu 16.04. Ubuntu 16.04 is EOL and it makes no sense to spend a lot of time working on a test workaround since it won't be needed in the near future anyway once we remove support for 16.04.
For now, I decided to exclude example rule fixture which was used by new end to end tests to make sure tests still pass on Ubuntu 16.04 for the time being - c0ed21b. Ssadly there is no other easy option which would not require many changes which will soon be removed anyway when removing support for 16.04. In the future, when we remove support for Ubuntu 16.04, I will add that test fixture back + merge new end to end test. |
I'll go ahead and merge this since this bug is rather annoying and affecting all deployments which utilize unicode characters in resource refs (+ Python 3). |
Description
This pull request fixes a bug in the API and CLI side to make sure we correctly handle scenarios where the resource primary key contains a non-ascii character.
Before this fix, st2api and also the CLI would throw an exception when trying to retrieve a resource with a primary key which contains non-ascii characters. This means users would not be able to perform "get one" API operation on such resources (either using the API or using the CLI).
Resolved #5188.
Background, Context
This issue was found and reported by @blag while testing the new release - #5188.
After digging in, I was able to reproduce the issue, find the root cause and fix it. I also confirmed this issue already existed in the past (I went as far as v3.0.0) so it's not a recent regression.
In fact, there are two issues / bugs as described below.
st2api API bug
webob.compat.url_unquote
function doesn't work correctly under Python 3 - it tries to decode byte string using latin character set which means it will explode when bytestring contains utf-8 values.We only support Python 3 now so we don't need that (broken) compat layer from webob anymore and can just use upstream
urlunquote
which works correctly.After fixing the issue, I confirmed the fix using curl:
So that's all good, but there is actually also a bug in the client code.
st2client CLI bug
Even after fixing the bug in the API, retrieving a resource using the CLI command returned an exception and didn't work.
After digging into the docs and Python docs, I found out that
sys.argv
will contain unicode characters which have been decoded using surrogate escape sequences.This of course won't work because it means we will try to send incorrect URL path with surrogate escape sequences (e.g.
examples.test_rule_utf8_n%ED%B3%83%ED%B2%A1me
) with the API request.In addition to that, various other places in the CLI will break as well since they expect unicode values with unicode characters and not surrogate escape sequences.
First I tried to fix this issue by simply only re-encoding the string in the
HTTPClient
layer. This worked fine for the actual API requests, but as per my comment above, there are also other places which break if value contains surrogate escape sequences which means I moved this re-encoding to as early as possible to ensure rest of the CLI code works with valid unicode values.As an additional safe-guard I added environment variable / feature flag with which this functionality can be disabled, but that really shouldn't be needed since even if re-encoding fails that exception is non-fatal (it's there mostly for testing purposes).
Some more context is also available in my comments in this issue - #5188.
Target release
For now I assigned this issue to the 3.5.0 milestone, but likely we should also do v3.4.2 release with this bug fix.
TODO