-
Notifications
You must be signed in to change notification settings - Fork 15
Encoding and Parsing Bit Strings
Lisp-Binary provides two constructs for encoding and decoding bit-strings. For byte-aligned bit strings (the very common case where several values of a few bits each are packed into a larger integer), you can simply define DEFBINARY struct slots as if they could be read one bit at a time, and let Lisp-Binary generate shifting-and-masking code.
For non-byte-aligned bit strings (for the rare case where the data format doesn't care about the fact that computers deal with data 8 bits at a time), Lisp-Binary provides a bit-stream abstraction that emulates a computer that can read and write one bit at a time.
DEFBINARY structs can eliminate the need to write tedious shift-and-mask bit-twiddling code for extracting bits out of larger integers (or for putting them into larger integers). For instance, suppose your data is represented as eight 3-bit values, for a total of 24 bits:
field name | one | two | three | four | five | six | seven | eight|
+-----+-----+-------+------+------+------+-------+------+
start/end | 0..2| 3..5| 6..8 | 9..11|12..14|15..17|18..20 |21..24|
In the above chart, 0
is the least significant bit, and 24
is the most significant, in a 3-byte
sequence that is meant to be read into a single integer and then the individual 3-bit fields would
be shifted and masked out.
This data can be represented by the following DEFBINARY struct:
CL-USER> (lisp-binary:defbinary eight-three (:byte-order :little-endian)
(one 0 :type (unsigned-byte 3))
(two 0 :type (unsigned-byte 3))
(three 0 :type (unsigned-byte 3))
(four 0 :type (unsigned-byte 3))
(five 0 :type (unsigned-byte 3))
(six 0 :type (unsigned-byte 3))
(seven 0 :type (unsigned-byte 3))
(eight 0 :type (unsigned-byte 3)))
EIGHT-THREE
CL-USER> (flexi-streams:with-output-to-sequence (out :element-type '(unsigned-byte 8))
(lisp-binary:write-binary
(make-eight-three :one 0
:two 1
:three 2
:four 3
:five 4
:six 5
:seven 6
:eight 7) out))
#(136 198 250)
(loop for n across #(136 198 250)
do (format t "~8,'0b " n))
10001000 11000110 11111010
This shows that in :little-endian
byte order, the first field goes in the least significant bit of the
least significant byte, and the last field goes in the most significant bit of the most significant byte. So
it matches the specification because the one
field is represented by the low-order bits and the eight
field is represented by the high-order bits.
So when writing a DEFBINARY struct that splits bits out of little-endian integers, define the fields in the low-order bits first.
:big-endian
byte order behaves differently.
;; FIXME: There is a bug in this definition. It is discussed below.
(defbinary eight-three (:byte-order :big-endian)
(one 0 :type (unsigned-byte 3))
(two 0 :type (unsigned-byte 3))
(three 0 :type (unsigned-byte 3))
(four 0 :type (unsigned-byte 3))
(five 0 :type (unsigned-byte 3))
(six 0 :type (unsigned-byte 3))
(seven 0 :type (unsigned-byte 3))
(eight 0 :type (unsigned-byte 3)))
Because the total number of bits is divisible by 8, Lisp-Binary will generate reader code that reads whole bytes and then parses them with shifting and masking— the same kind of code you might write manually. The writer code shifts and ORs bits together.
To see how the above encodes, you can use FLEXI-STREAMS (on which Lisp-Binary depends) to encode it into an array:
CL-USER> (flexi-streams:with-output-to-sequence (out :element-type '(unsigned-byte 8))
(lisp-binary:write-binary
(make-eight-three :one 0
:two 1
:three 2
:four 3
:five 4
:six 5
:seven 6
:eight 7) out))
#(5 57 119)
To make it easier to see the exact bit layout, print it in binary:
CL-USER> (loop for n across *
do (format t "~8,'0b " n))
00000101 00111001 01110111
NIL
Here we see that the fields are in the wrong order compared to the specification. Bits 0..2 are
the low-order bits of the 24-bit field, but the value with all zeroes is in the high-order position.
Therefore, when you write DEFBINARY structs for fields such as these in :big-endian
structs, you should write the high-order
fields first:
;; Corrected version of EIGHT-THREE
(defbinary eight-three (:byte-order :big-endian)
(eight 0 :type (unsigned-byte 3))
(seven 0 :type (unsigned-byte 3))
(six 0 :type (unsigned-byte 3))
(five 0 :type (unsigned-byte 3))
(four 0 :type (unsigned-byte 3))
(three 0 :type (unsigned-byte 3))
(two 0 :type (unsigned-byte 3))
(one 0 :type (unsigned-byte 3)))
The bit pattern generated this time is:
11111010 11000110 10001000
...which matches the specification.
If the total number of bits in your DEFBINARY struct is a multiple of 8, then Lisp-Binary will generate shifting-and-masking code and do its I/O in whole bytes.
But it's also legal to define DEFBINARY structs with bit counts that are not a multiple of 8. For instance:
CL-USER> (defbinary not-byte-aligned (:byte-order :big-endian)
(some-bits 0 :type (unsigned-byte 4)))
NOT-BYTE-ALIGNED
While this is supported, it doesn't "just work" like byte-aligned structs do. It is necessary to wrap the stream in a special Gray stream that supports bit-by-bit I/O:
CL-USER> (flexi-streams:with-input-from-sequence (in #(#b10101111))
(with-wrapped-in-bit-stream (wrapped in :byte-order :big-endian)
(read-binary 'not-byte-aligned wrapped)))
#S(NOT-BYTE-ALIGNED :SOME-BITS 10)
1/2 ;; number of bytes read
The with-wrapped-in-bit-stream
form requires a :byte-order
parameter because the shifting and masking is happening
in the stream instead of the code generated by the DEFBINARY macro.
Common Lisp doesn't provide a way for standard functions such as CL:READ-SEQUENCE
to do sub-byte I/O. Therefore, the
relevant functionality for doing this is found mostly in the Lisp-Binary:READ-BYTES
and Lisp-Binary:WRITE-BYTES
methods. The bit stream object gives us a place to keep the "unread" bits that are parts of bytes that in fact have
been read from the underlying real stream.
The following code demonstrates reading one bit at a time using READ-BYTES
:
CL-USER> (flexi-streams:with-input-from-sequence (in #(#b10101111))
(with-wrapped-in-bit-stream (wrapped in :byte-order :big-endian)
(loop repeat 4 collect (read-bytes 1/8 wrapped))))
(#(1) #(0) #(1) #(0))
The less-primitive READ-INTEGER
and WRITE-INTEGER
methods combine reading and writing with decoding and encoding:
CL-USER> (flexi-streams:with-input-from-sequence (in #(#b10101111))
(with-wrapped-in-bit-stream (wrapped in :byte-order :big-endian)
(loop repeat 2 collect (read-integer 1/4 wrapped))))
(2 2)
The eight-three
struct could be defined in this alternative manner:
;; Corrected version of EIGHT-THREE
(defbinary eight-three (:byte-order :big-endian)
((eight seven six five four three two one)
0 :type (bit-field :raw-type (unsigned-byte 24)
:member-types ((unsigned-byte 3)
(unsigned-byte 3)
(unsigned-byte 3)
(unsigned-byte 3)
(unsigned-byte 3)
(unsigned-byte 3)
(unsigned-byte 3)
(unsigned-byte 3)))))
The syntax presented at the beginning of this page is syntactic sugar for the bit-field
type. The DEFBINARY macro groups sub-byte fields into bit-field
members, generates a bit-field
type descriptor, and then generates the reader and writer code.
Lisp-Binary relies on a heuristic when determining whether an array requires bit-by-bit I/O to handle. If the array's element type has a number of bits that are divisible by 8, then the array's length is assumed to be divisible by eight, while if the element type has a number of bits that are not divisible by 8, then we assume that the array's total length is not divisible by 8. Lisp-Binary doesn't attempt to do grouping with arrays.
There are also rules for other things whose sizes might be expressed in non-whole-byte values. For example, the count integer
of a counted-string
and the length of the terminator in a terminated-string
can be non-integer numbers of bytes long.
If a field is represented by another DEFBINARY struct, that struct's definition is consulted to decide if a bit-stream is required
to read this one. The logic is significantly dumber than the logic that groups unsigned-byte
and signed-byte
fields together, so don't expect grouping to happen if, for example, you have two DEFBINARY structs that are 12 bits each. You'll need a bit-stream to read or write that.