A data serialization format, designed to be an improvement on JSON. It adds timestamp, bytes and decimal types, as well as multi-line strings.
-
File extension
.zish
-
MIME type
application/x.zish
Here’s an example of Zish:
/* This is a comment */ /* Curly brackets delimit a map */ { "title": "A Hero of Our Time", /* A key / value pair of strings */ "description": null, "key": 'a3NoaGdybA==', /* Single quotes delimit base64 encoded binary */ "number_of_novellas": 5, "price": 7.99, /* A decimal number */ "read_date": 2017-07-16T14:05:00Z, "tags": [ /* Square brackets delimit a list */ "russian", "novel", "19th century", ], "would_recommend": true, }
Zish is a valid stream of Unicode code points encoded in UTF-8. The formal specification of Zish is an ANTLR grammar. Zish has the following data types:
Follows the RFC3339 format. Eg.
2017-08-09T10:40:09Z
, 2017-08-09T10:40:09.037Z
.
Strings are sequences of Unicode characters and are delimited by a
double-quote character "
, and can contain line breaks. If you need to write
a string that contains a "
then escape it with a backslash by doing \"
.
There are other escapes allowed in strings. Here’s the full list:
Escape | Description | ASCII Hex Value |
---|---|---|
|
Alert (beep sound) |
07 |
|
Backspace |
08 |
|
Horizontal tab |
09 |
|
Linefeed |
0A |
|
Virtical tab |
0B |
|
Formfeed |
0C |
|
Carriage return |
0D |
|
Double quotation mark |
22 |
|
Backslash |
5C |
|
A backslash before a newline sequence is a way to remove a newline. |
|
|
A unicode code point. |
|
|
A unicode code point. |
Binary data is delimited by the single-quote character, and represented by a
base64 encoding as specified by
RFC3548. Eg. 'a3NoaGdybA=='
.
Integers can’t begin with a +
, and integers with more than one digit can’t
begin with a zero. Examples of valid integers are:
-
0
-
-0
-
389
-
-589
A decimal is a
representation of a real number in base 10. The exponent starts with an e
or
E
. Examples are:
-
0.993
-
1.78e-1
Decimals can also have the special values
NaN
and Infinity
(optionally prefixed
with +
or -
).
A list is an ordered
sequence of values, where the same value may occur more than once. Lists
begin with a [
and end with a ]
and the values are separated by ,
. An
example is:
[ 56, "pod", 0, ]
Trailing commas are optional. An element of a list can be any Zish type including a list or map.
A map is an unordered
collection of key / value pairs. Duplicate keys aren’t allowed (for keys that
are strings, the test for uniqueness is done without any
normalization
of the strings). Maps start with a {
and end with a }
. The pairs are
separated by a ,
and the key is separated from the value with a :
. Trailing
commas are optional. Keys can by any type except for list, map or null, and
values can be of any type. An example of a map is:
{ "hello": 90, true: "larch", 5: [ null, ], }
Comments begin with /*
and end with */
.
Comments are treated as whitespace rather than values, so they’re ignored by the parser and not passed through to the application.
In XML, comments are passed through to the application, which is thought to lead to an abuse of comments because it’s unclear whether they’re part of the content or not. JSON avoids this problem by not allowing comments. Zish steers a middle path here by allowing comments, but ignoring them at the parsing stage.
To represent real numbers, JSON uses binary floating point numbers, but Zish uses decimal floating point numbers. Zish also has the following data types that JSON doesn’t have:
-
Timestamp
-
Bytes (a sequence of bytes)
Trailing commas in lists and maps are allowed in Zish, but they aren’t in JSON.
JSON has an 'object' type whereas Zish has a 'map' type. They both represent an unordered collection of name / value pairs, but they have two differences:
-
In JSON the 'name' part of the name / value pair can only be a string, but in Zish the 'name' part can be any Zish value.
-
In Zish, duplicate names aren’t allowed, but in JSON they are.
End of line (EOL) character sequences seem to be the source of problems in data serialization formats. One problem is that different operating systems have different conventions for what combination of characters constitutes an EOL. Unix based systems use LF, but Windows uses CR+LF. So if, for example, a file is created on a Debian machine and then opened on a Windows machine, all the text runs together without any line breaks.
JSON gets round this by saying that within strings, literal line breaks aren’t
allowed, and you have to use an escaped line break \n
instead.
Zish takes the view that Unicode has solved the EOL problem. Since Zish is a sequence of Unicode characters, it follows that Zish should respect the Unicode definition of EOLs (ie. LF, CR, CR+LF and others). So regardless of the operating system, Zish is first and foremost a Unicode sequence.
This allows multi-line strings to be written more clearly in Zish.
Zish is influenced by the text representation of Amazon Ion, but there are several differences between them:
-
Ion doesn’t have a map type, instead it has a struct type which allows duplicate keys.
-
Ion has data types such as ‘symbol’, s-expressions, and ‘keyword’ which Zish doesn’t have.
-
There are three text types in Ion, but Zish just has one.
-
There are two binary data types in Ion, but Zish just has one.
-
Ion has a binary as well as text representation.
Zish is close in spirit to edn but again there are differences:
-
Edn is extensible, ie. it has a mechanism for user defined types.
-
Edn has types such as ‘character’, ‘symbol’ and ‘vector’ which Zish doesn’t have.
If you’re working on an implementation of Zish, raise an issue on GitHub and we’ll add a link. It doesn’t need to be a complete implementation, a work-in-progress is fine.