Buffer#toString throw on unsupported encodings #2418

SheetJSDev · 2021-09-07T20:42:47Z

Fixes #2417

New behavior: if argument is not undefined/unspecified or "utf8", throw an error.

Note: null is not a supported encoding in NodeJS

src-input/duk_bi_buffer.c

tests/ecmascript/test-bi-nodejs-buffer-tostring.js

SheetJSDev · 2021-09-08T21:21:23Z

The core logic has been factored out since both Buffer.isEncoding and Buffer#toString have to check the encoding.

For the case insensitive matching, raw bit-ops are preferred since the strcmp variants (stricmp / strcasecmp) and the ctypes.h utilities (tolower / toupper) are not currently used.

Both Buffer#toString and Buffer.isEncoding tests have been refreshed.

svaarala · 2021-09-08T21:35:31Z

Thanks for the revisions 👍

svaarala · 2021-09-09T20:57:06Z

src-input/duk_bi_buffer.c

+	for (i = 0; i < DUK_BUFFER_ENCODING_MAX_LEN; ++i) {
+		if (encoding[i] == 0) { buf[i] = 0; break; }
+		buf[i] = (char) (encoding[i] | 0x20);
+	}


This loop works incorrectly if i reaches the maximum encoding length without finding a NUL terminator in the input: buf will remain unterminated and it's used for a strcmp() later.

Inputs like utf8\u0000bar are (technically) handled incorrectly, as the loop will terminate on the NUL.

ORing 0x20 on all input chars also results in some incorrect inputs accepted, e.g. "utf\u000d8" will be accepted because 0x0d | 0x20 = 0x2d which is '-'.

So overall I don't think approach is easy to make work.

I wonder if it'd produce less code to just short circuit compare the input, e.g.:

if ((p[0] | 0x20) == 'u' && (p[1] | 0x20) == 't' && (p[2] | 0x20) == 'f') { if (p[3] == '8' && encoding_len == 4) { /* utf8 */ } else if (p[3] == '-' && p[4] == '8' && encoding_len == 5) { /* utf8 */ } } /* invalid */

svaarala · 2021-09-13T19:39:18Z

I'm merging the clang-format indent changes to master and to v2-maintenance. Once that's done, the easiest approach is probably to clang-format your changed files first, and then rebase against master/v2-maintenance. This should keep the diff minimal and avoid conflicts.

svaarala · 2021-09-14T21:46:24Z

Ok, clang-format changes are now in master and v2-maintenance. To reindent code:

$ make docker-image-clang-format
$ make clang-format-source

Buffer#toString throw on unsupported encodings

d2c84fe

SheetJSDev force-pushed the v2-buf-tostr-enc-error branch from 77364be to d2c84fe Compare September 7, 2021 20:56

SheetJSDev mentioned this pull request Sep 7, 2021

Buffer#toString throw on unsupported encodings #2419

Open

svaarala requested changes Sep 8, 2021

View reviewed changes

src-input/duk_bi_buffer.c Show resolved Hide resolved

src-input/duk_bi_buffer.c Outdated Show resolved Hide resolved

tests/ecmascript/test-bi-nodejs-buffer-tostring.js Outdated Show resolved Hide resolved

Buffer encoding 'utf-8' and case insensitive match

d760a1f

svaarala requested changes Sep 9, 2021

View reviewed changes

svaarala force-pushed the v2-maintenance branch from b93df51 to a9b65f7 Compare February 11, 2022 21:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Buffer#toString throw on unsupported encodings #2418

Buffer#toString throw on unsupported encodings #2418

SheetJSDev commented Sep 7, 2021

SheetJSDev commented Sep 8, 2021

svaarala commented Sep 8, 2021

svaarala Sep 9, 2021 •

edited

Loading

svaarala Sep 9, 2021

svaarala commented Sep 13, 2021

svaarala commented Sep 14, 2021

Buffer#toString throw on unsupported encodings #2418

Are you sure you want to change the base?

Buffer#toString throw on unsupported encodings #2418

Conversation

SheetJSDev commented Sep 7, 2021

SheetJSDev commented Sep 8, 2021

svaarala commented Sep 8, 2021

svaarala Sep 9, 2021 • edited Loading

Choose a reason for hiding this comment

svaarala Sep 9, 2021

Choose a reason for hiding this comment

svaarala commented Sep 13, 2021

svaarala commented Sep 14, 2021

svaarala Sep 9, 2021 •

edited

Loading