Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Salvage #13772 #13935

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

Swiftb0y
Copy link
Member

Simplified alternative of #13772 which implements the original scope I proposed.

CC @christophehenry @JoergAtGithub

Allow to convert from UTF-8 to whatever encoding the device supports
@Swiftb0y Swiftb0y force-pushed the review/christophehenry/gh13772-charset-encoding-salvage branch 4 times, most recently from f4219e9 to bd5996c Compare November 25, 2024 14:25
@Swiftb0y Swiftb0y force-pushed the review/christophehenry/gh13772-charset-encoding-salvage branch from bd5996c to d131d1c Compare November 25, 2024 22:04
Copy link
Member

@acolombier acolombier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

case ISO_10646_UCS_2:
return 68;
}
// unreachable (TODO assert false?)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That assert would make sense IMO

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, still TODO

Returning part of a usable result instead of nullbytes since
those likely terminate the string early or even corrupt the
underlying binary message format the buffer is embedded in.
@JoergAtGithub JoergAtGithub dismissed their stale review November 27, 2024 21:50

Blocking issue resolved

* @param value The string to encode
* @returns The converted String as an array of bytes. Will return an empty buffer on conversion error or unavailable charset.
*/
function convertCharset(targetCharset: string, value: string): ArrayBuffer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait this is still documented even though it's now an internal API?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forgot to remove. thanks for the reminder.

const char* encoderName = encoderNameArray.constData();
#endif
QStringEncoder fromUtf16 = QStringEncoder(encoderName);
if (!fromUtf16.isValid()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why you removed the flags here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see the commit message, writing the replacement char is better than replacing with null bytes IMO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be a problem. During the tests I noticed that replacement char varies between Ubuntu and Fedora (maybe between Qt versions). Replace invalid chars with \0x00 is the most predictable option.

Copy link
Member Author

@Swiftb0y Swiftb0y Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure? I think it depends on the encoding. The Qt docs say they use QChar::ReplacementCharacter or a question mark.

Copy link
Contributor

@christophehenry christophehenry Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I noticed here. The relevant commit is probably this one. Although I tested your branch and the tests pass here too so I'm not sure anymore.

Comment on lines +679 to +694
TEST_F(ControllerScriptEngineLegacyTest, convertCharsetCorrectValueStringCharset) {
const auto result = evaluate("engine.convertCharset('ISO-8859-15', 'Hello!')");

// ISO-8859-15 ecoded 'Hello!'
EXPECT_EQ(qjsvalue_cast<QByteArray>(result),
QByteArrayView::fromArray({'\x48', '\x65', '\x6c', '\x6c', '\x6f', '\x21'}));
}

TEST_F(ControllerScriptEngineLegacyTest, convertCharsetUnsupportedChars) {
auto result = qjsvalue_cast<QByteArray>(
evaluate("engine.convertCharset('ISO-8859-15', 'مايأ نامز')"));
char sub = '\x1A'; // ASCII/Latin9 SUB character
EXPECT_EQ(result,
QByteArrayView::fromArray(
{sub, sub, sub, sub, '\x20', sub, sub, sub, sub}));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those tests aren't relevant anymore since it's an internal API now, are they?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly yes. I should translate them to their enum equivalent so we can at least be reasonably sure about the output data.

Comment on lines +664 to +668
TEST_F(ControllerScriptEngineLegacyTest, convertCharsetUndefinedOnUnknownCharset) {
const auto result = evaluate("engine.convertCharset('NULL', 'Hello!')");

EXPECT_EQ(qjsvalue_cast<QByteArray>(result), QByteArrayView(""));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah... Not quite sure what to do here honestly. It seems that there are implicit conversions to enums happening, which isn't great if it just silently succeeds. I wonder what value plain undefined results in? I'd guess undefined->0->US_ASCII?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants