-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Salvage #13772 #13935
base: main
Are you sure you want to change the base?
Salvage #13772 #13935
Conversation
Allow to convert from UTF-8 to whatever encoding the device supports
f4219e9
to
bd5996c
Compare
bd5996c
to
d131d1c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
case ISO_10646_UCS_2: | ||
return 68; | ||
} | ||
// unreachable (TODO assert false?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That assert would make sense IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, still TODO
Returning part of a usable result instead of nullbytes since those likely terminate the string early or even corrupt the underlying binary message format the buffer is embedded in.
src/controllers/scripting/legacy/controllerscriptinterfacelegacy.h
Outdated
Show resolved
Hide resolved
* @param value The string to encode | ||
* @returns The converted String as an array of bytes. Will return an empty buffer on conversion error or unavailable charset. | ||
*/ | ||
function convertCharset(targetCharset: string, value: string): ArrayBuffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait this is still documented even though it's now an internal API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forgot to remove. thanks for the reminder.
const char* encoderName = encoderNameArray.constData(); | ||
#endif | ||
QStringEncoder fromUtf16 = QStringEncoder(encoderName); | ||
if (!fromUtf16.isValid()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why you removed the flags here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see the commit message, writing the replacement char is better than replacing with null bytes IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be a problem. During the tests I noticed that replacement char varies between Ubuntu and Fedora (maybe between Qt versions). Replace invalid chars with \0x00
is the most predictable option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure? I think it depends on the encoding. The Qt docs say they use QChar::ReplacementCharacter
or a question mark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I noticed here. The relevant commit is probably this one. Although I tested your branch and the tests pass here too so I'm not sure anymore.
TEST_F(ControllerScriptEngineLegacyTest, convertCharsetCorrectValueStringCharset) { | ||
const auto result = evaluate("engine.convertCharset('ISO-8859-15', 'Hello!')"); | ||
|
||
// ISO-8859-15 ecoded 'Hello!' | ||
EXPECT_EQ(qjsvalue_cast<QByteArray>(result), | ||
QByteArrayView::fromArray({'\x48', '\x65', '\x6c', '\x6c', '\x6f', '\x21'})); | ||
} | ||
|
||
TEST_F(ControllerScriptEngineLegacyTest, convertCharsetUnsupportedChars) { | ||
auto result = qjsvalue_cast<QByteArray>( | ||
evaluate("engine.convertCharset('ISO-8859-15', 'مايأ نامز')")); | ||
char sub = '\x1A'; // ASCII/Latin9 SUB character | ||
EXPECT_EQ(result, | ||
QByteArrayView::fromArray( | ||
{sub, sub, sub, sub, '\x20', sub, sub, sub, sub})); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those tests aren't relevant anymore since it's an internal API now, are they?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly yes. I should translate them to their enum
equivalent so we can at least be reasonably sure about the output data.
TEST_F(ControllerScriptEngineLegacyTest, convertCharsetUndefinedOnUnknownCharset) { | ||
const auto result = evaluate("engine.convertCharset('NULL', 'Hello!')"); | ||
|
||
EXPECT_EQ(qjsvalue_cast<QByteArray>(result), QByteArrayView("")); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah... Not quite sure what to do here honestly. It seems that there are implicit conversions to enums happening, which isn't great if it just silently succeeds. I wonder what value plain undefined
results in? I'd guess undefined
->0
->US_ASCII
?
Simplified alternative of #13772 which implements the original scope I proposed.
CC @christophehenry @JoergAtGithub