Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buffer#toString throw on unsupported encodings #2418

Open
wants to merge 2 commits into
base: v2-maintenance
Choose a base branch
from

Conversation

SheetJSDev
Copy link
Contributor

Fixes #2417

New behavior: if argument is not undefined/unspecified or "utf8", throw an error.

Note: null is not a supported encoding in NodeJS

src-input/duk_bi_buffer.c Show resolved Hide resolved
src-input/duk_bi_buffer.c Outdated Show resolved Hide resolved
tests/ecmascript/test-bi-nodejs-buffer-tostring.js Outdated Show resolved Hide resolved
@SheetJSDev
Copy link
Contributor Author

The core logic has been factored out since both Buffer.isEncoding and Buffer#toString have to check the encoding.

For the case insensitive matching, raw bit-ops are preferred since the strcmp variants (stricmp / strcasecmp) and the ctypes.h utilities (tolower / toupper) are not currently used.

Both Buffer#toString and Buffer.isEncoding tests have been refreshed.

@svaarala
Copy link
Owner

svaarala commented Sep 8, 2021

Thanks for the revisions 👍

for (i = 0; i < DUK_BUFFER_ENCODING_MAX_LEN; ++i) {
if (encoding[i] == 0) { buf[i] = 0; break; }
buf[i] = (char) (encoding[i] | 0x20);
}
Copy link
Owner

@svaarala svaarala Sep 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop works incorrectly if i reaches the maximum encoding length without finding a NUL terminator in the input: buf will remain unterminated and it's used for a strcmp() later.

Inputs like utf8\u0000bar are (technically) handled incorrectly, as the loop will terminate on the NUL.

ORing 0x20 on all input chars also results in some incorrect inputs accepted, e.g. "utf\u000d8" will be accepted because 0x0d | 0x20 = 0x2d which is '-'.

So overall I don't think approach is easy to make work.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it'd produce less code to just short circuit compare the input, e.g.:

if ((p[0] | 0x20) == 'u' && (p[1] | 0x20) == 't' && (p[2] | 0x20) == 'f') {
    if (p[3] == '8' && encoding_len == 4) {
        /* utf8 */
    } else if (p[3] == '-' && p[4] == '8' && encoding_len == 5) {
        /* utf8 */
    }
}
/* invalid */

@svaarala
Copy link
Owner

I'm merging the clang-format indent changes to master and to v2-maintenance. Once that's done, the easiest approach is probably to clang-format your changed files first, and then rebase against master/v2-maintenance. This should keep the diff minimal and avoid conflicts.

@svaarala
Copy link
Owner

Ok, clang-format changes are now in master and v2-maintenance. To reindent code:

$ make docker-image-clang-format
$ make clang-format-source

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants