You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems RESP v2 and RESP3 do not clearly distinguish between text strings (strings for short) and binary strings, clobs, blobs (binary for short). I might missing something, if so please pardon me.
Other binary serialization formats like MessagePack and CBOR clearly distinguish between strings (UTF-8) and binary data type (a list of any octets).
This is a serious problem and what's preventing me from implementing MessagePack in my ObjC framework. I have to know whether I should create a string object or a data object. Creating a string object for everything will fail if it is not UTF-8 and always creating a data object will be very impractical.
Right now RESP has 4 string kinds: (1) simple string; (2) bulk string; (3) verbatim string; (4) streaming string. However, in none of them it can be clearly identified if the string can be safely treated as text, say UTF-8 string, or it should be left as binary.
The verbatim string has the encoding field, which can be specified as txt:; but the specification does not clearly state what txt: means. Does it mean it is UTF-8 text?
In most programming languages keys in built-in map types are strings, so one could try to parse the RESP Map type keys as strings, but, again, it is not guaranteed that those will be valid strings, and the text encoding is unknown.
Error messages in all programming languages are text strings, but the RESP Error types (simple and bulk) not guaranteed to be valid text and the text encoding is not specified.
Text strings, like object keys and object values—for example— from INFO command are returned as "bulk string" $ type, which in documentation is defined as binary string, i.e. a binary blob. It is not text: (1) it could potentially contain octets which would be illegal in text; (2) it is not clear which text encoding should be used.
CLUTER MYID returns a binary string—Bulk string—$, but the actual data is a text string, holding a HEX sequence.
This forces me to have a special opt-in parsing mode for RESP responses, where all Bulk strings $ are attempted to be parsed as text strings, but if the UTF-8 validation of that string fails it bails and parses the Bulk string as an array of bytes instead. This is very inefficient and hacky.
This also results into the complexity that, when parsing a Bulk string, the parser can return 3 different types: (1) binary blob, e.g. Uint8Array; (2) a native string (if it is valid UTF-8); (3) a null value (if it is null Bulk string $-1\r\n).
Proposal
Add ability to clearly discriminate between binary data and UTF-8 text.
A new type could be added which is always UTF-8 valid string.
Alternatively, the Verbatim strings could specify txt: and bin: encoding formats. Where txt: would be explicitly reserved for UTF-8 strings and bin: for binary data.
Clearly state in specification that the Simple string format is UTF-8 text (or ASCII or Latin1).
Currently, strings that contain newlines have to be encoded as blob strings, so parsers/clients that care about correctness have to return a buffer of bytes, rather than a string. This seems like a huge problem to me. The two are completely different, semantically, and the blobness will percolate up to the user, who will have to decipher "( \865\176 \860\662 \865\176)\n", instead of
It seems RESP v2 and RESP3 do not clearly distinguish between text strings (strings for short) and binary strings, clobs, blobs (binary for short). I might missing something, if so please pardon me.
Other binary serialization formats like MessagePack and CBOR clearly distinguish between strings (UTF-8) and binary data type (a list of any octets).
RESP seems to be running in the same problem that MessagePack v1 had: there was no clear distinction between strings and binary data.
Right now RESP has 4 string kinds: (1) simple string; (2) bulk string; (3) verbatim string; (4) streaming string. However, in none of them it can be clearly identified if the string can be safely treated as text, say UTF-8 string, or it should be left as binary.
The verbatim string has the encoding field, which can be specified as
txt:
; but the specification does not clearly state whattxt:
means. Does it mean it is UTF-8 text?In most programming languages keys in built-in map types are strings, so one could try to parse the RESP Map type keys as strings, but, again, it is not guaranteed that those will be valid strings, and the text encoding is unknown.
Error messages in all programming languages are text strings, but the RESP Error types (simple and bulk) not guaranteed to be valid text and the text encoding is not specified.
Text strings, like object keys and object values—for example— from
INFO
command are returned as "bulk string"$
type, which in documentation is defined as binary string, i.e. a binary blob. It is not text: (1) it could potentially contain octets which would be illegal in text; (2) it is not clear which text encoding should be used.CLUTER MYID
returns a binary string—Bulk string—$
, but the actual data is a text string, holding a HEX sequence.This forces me to have a special opt-in parsing mode for RESP responses, where all Bulk strings
$
are attempted to be parsed as text strings, but if the UTF-8 validation of that string fails it bails and parses the Bulk string as an array of bytes instead. This is very inefficient and hacky.This also results into the complexity that, when parsing a Bulk string, the parser can return 3 different types: (1) binary blob, e.g.
Uint8Array
; (2) a native string (if it is valid UTF-8); (3) anull
value (if it is null Bulk string$-1\r\n
).Proposal
Add ability to clearly discriminate between binary data and UTF-8 text.
A new type could be added which is always UTF-8 valid string.
Alternatively, the Verbatim strings could specify
txt:
andbin:
encoding formats. Wheretxt:
would be explicitly reserved for UTF-8 strings andbin:
for binary data.Clearly state in specification that the Simple string format is UTF-8 text (or ASCII or Latin1).
Alternatively add a UTF-8 text string tag.
The text was updated successfully, but these errors were encountered: