-
-
Notifications
You must be signed in to change notification settings - Fork 747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot view rule with UTF-8 character in name #5188
Comments
Confirmed: the same thing happens with actions/workflows created with utf-8 characters in their names. Listing actions includes the utf8-named actions, including the utf-8 characters, but attempting to get the details of those actions returns an "Action is not found." error. |
IIRC, at some point in the (distant) past when storing names with unicode characters we actually stored unicode escape sequence instead of actual unicode values (I forget if that was only under Python 2 or if some other environment things such as a locale affected it). It could be that now we try to (correctly) directly store the unicode value. @blag Per the log output it actually looks like the locale is not set to utf-8 (otherwise the logging exception would not be there), is this indeed the case? Also, does it affect other distros? |
So yeah, I had a quick look (latest master, utf-8 locale) and it seems like this will be indeed a pain to handle correctly in all the cases. Here is an error on the server and the client side: respiter = self.wsgi(environ, resp.start_response)
File "/home/vagrant/st2/st2common/st2common/middleware/instrumentation.py", line 47, in __call__
endpoint, _ = self.router.match(request)
File "/home/vagrant/st2/st2common/st2common/router.py", line 243, in match
path = url_unquote(req.path)
File "/home/vagrant/st2/virtualenv/lib/python3.6/site-packages/webob/request.py", line 476, in path
bpath = bytes_(self.path_info, self.url_encoding)
File "/home/vagrant/st2/virtualenv/lib/python3.6/site-packages/webob/descriptors.py", line 70, in fget
return req.encget(key, encattr=encattr)
File "/home/vagrant/st2/virtualenv/lib/python3.6/site-packages/webob/request.py", line 165, in encget
return bytes_(val, 'latin-1').decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 35: invalid continuation byte
[2021-03-13 10:39:09 +0000] [2827] [DEBUG] worker: SIGWINCH ignored.
[2021-03-13 10:39:09 +0000] [2767] [INFO] Handling signal: winch
Traceback (most recent call last):
File "/home/vagrant/st2/st2client/st2client/commands/resource.py", line 195, in get_resource_by_pk
instance = self.manager.get_by_id(pk, **kwargs)
File "/home/vagrant/st2/st2client/st2client/models/core.py", line 42, in decorate
return func(*args, **kwargs)
File "/home/vagrant/st2/st2client/st2client/models/core.py", line 214, in get_by_id
self.handle_error(response)
File "/home/vagrant/st2/st2client/st2client/models/core.py", line 173, in handle_error
response.raise_for_status()
File "/home/vagrant/st2/virtualenv/lib/python3.6/site-packages/requests/models.py", line 941, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error
Unable to retrieve detailed message from the HTTP response. Expecting value: line 1 column 1 (char 0)
for url: http://127.0.0.1:9101/v1/rules/examples.test_rule_utf8_n%ED%B3%83%ED%B2%A1me
ERROR: 'utf-8' codec can't encode characters in position 31-32: surrogates not allowed Client sends URL path which looks like this I also tested the both scenarios and both result an error, albeit a different one since in the second scenario request is processed correctly, but there is an issue in DB layer in how we translate the key.
I checked the database and the values are indeed stored as actual unicode characters (which is something I would expect and it's the right thing). There is still a question though if that was the case in the past with Python 2. { "_id" : ObjectId("604c962c5779c52cbe31a481"), "tags" : [ ], "uid" : "rule:examples:test_rule_utf8_náme", "metadata_file" : "", "name" : "test_rule_utf8_náme", "ref" : "examples.test_rule_utf8_náme", "description" : "Sample rule firing on action completion.", "pack" : "examples", "type" : { "ref" : "standard", "parameters" : { } }, "trigger" : "core.2df70229-bed3-40cf-8a29-d3553eebe260", "criteria" : { "trigger.channel" : { "pattern" : "slack", "type" : "equals" } }, "action" : { "ref" : "slack.post_message", "parameters" : { "message" : "{{trigger.message}}" } }, "context" : { "user" : "stanley" }, "enabled" : true } So we will need to answer the following questions and decide how to proceed:
Some related issues: |
OK, so this change fixes it for curl - b367ef2. curl -X GET -H 'User-Agent: python-requests/2.23.0' -H 'Accept-Encoding: gzip, deflate' -H 'Accept: */*' -H 'Connection: keep-alive' http://127.0.0.1:9101/v1/rules/examples.test_rule_utf8_náme
curl -X GET -H 'User-Agent: python-requests/2.23.0' -H 'Accept-Encoding: gzip, deflate' -H 'Accept: */*' -H 'Connection: keep-alive' "http://127.0.0.1:9101/v1/rules/examples.test_rule_utf8_n%C3%A1me" When sending a request with unicode characters in the path, it will get correctly encoded using URI encoding sequences ( This works correctly with curl, but not with our CLI. I'm still digging in why it doesn't work correctly with the client, but it seems that somewhere in our CLI layer, that unicode value gets incorrectly decoded to a byte string with unicode escape sequences (instead of being left alone and then parsing unicode value to the http client which should url encode / quote it correctly). It appears the issue is that actual I believe we will need to do something like this in the client code to handle those scenarios correctly - in short we want actual unicode string with unicode characters and not byte string with surrogate escape sequences. sys_argv_value = sys_argv_value.encode('ascii', 'surrogateescape').decode('utf-8') Right now the main question is where to do that. I think that doing in the the HTTPClient layer before passing url to the requests is probably fine. I confirmed the same issue also exists under Python 3 in the older releases. |
I confirmed that the same issue existed on ST2 v3.4.0, so this is not a regression in 3.4.1. This is probably something that we should circle back to, and write tests for. But for 3.4.1, it's not a problem. |
Yep, it's not a regression (so not a blocker for v3.4.12), likely been there for a long time (I tested with very old release under Python 3 and it still appears to be there). Here is also WIP fix for the client side - 4e0bb04. I confirmed it works end to end, but likely I will end up going with a different approach and already re-encoding sys.argv arg value earlier in the pipeline. |
SUMMARY
The ST2 API does not return results when viewing a rule with a UTF-8 character in the name.
I suspect that this will also happen with action names, alias names, and pack names.
STACKSTORM VERSION
st2 3.4.1, on Python 3.6.13
This was during the 3.4.1 release.
OS, environment, install method
Ubuntu 16.04, st2vagrant
Steps to reproduce the problem
Show how to reproduce the problem, using a minimal test-case. Make sure to include any content
(pack content - workflows, actions, etc.) which are needed to reproduce the problem.
test_rule_utf8_náme
. Creating the rule should work just fine.Expected Results
What did you expect to happen when running the steps above?
Creating a rule with a utf-8 character in its name should work (it currently does).
Viewing a rule with a utf-8 character in its name should work (it currently does not).
Actual Results
What happened? What output did you get?
Getting the rule via CLI does not work:
Note that listing rules returns the rule with the correct name:
The log formatting for logging exceptions is a little messed up, so I tweaked it to make the exception traceback a little bit more readable:
Reformatted log
Full log
The text was updated successfully, but these errors were encountered: