Skip to content

Encoding and Parsing Bit Strings

j3pic edited this page Mar 7, 2024 · 6 revisions

Lisp-Binary provides two constructs for encoding and decoding bit-strings. For byte-aligned bit strings (the very common case where several values of a few bits each are packed into a larger integer), you can simply define DEFBINARY struct slots as if they could be read one bit at a time, and let Lisp-Binary generate shifting-and-masking code.

For non-byte-aligned bit strings (for the rare case where the data format doesn't care about the fact that computers deal with data 8 bits at a time), Lisp-Binary provides a bit-stream abstraction that emulates a computer that can read and write one bit at a time.

Byte-aligned operation

DEFBINARY structs can eliminate the need to write tedious shift-and-mask bit-twiddling code for extracting bits out of larger integers (or for putting them into larger integers). For instance, suppose your data is represented as eight 3-bit values, for a total of 24 bits:

field name | one | two | three | four | five | six  | seven | eight|
           +-----+-----+-------+------+------+------+-------+------+
start/end  | 0..2| 3..5|  6..8 | 9..11|12..14|15..17|18..20 |21..24| 

In the above chart, 0 is the least significant bit, and 24 is the most significant, in a 3-byte sequence that is meant to be read into a single integer and then the individual 3-bit fields would be shifted and masked out.

This data can be represented by the following DEFBINARY struct:

CL-USER> (lisp-binary:defbinary eight-three (:byte-order :little-endian)
  (one 0 :type (unsigned-byte 3))
  (two 0 :type (unsigned-byte 3))
  (three 0 :type (unsigned-byte 3))
  (four 0 :type (unsigned-byte 3))
  (five 0 :type (unsigned-byte 3))
  (six 0 :type (unsigned-byte 3))
  (seven 0 :type (unsigned-byte 3))
  (eight 0 :type (unsigned-byte 3)))
EIGHT-THREE
CL-USER> (flexi-streams:with-output-to-sequence (out :element-type '(unsigned-byte 8))
	   (lisp-binary:write-binary 
	    (make-eight-three :one 0
			      :two 1
			      :three 2
			      :four 3
			      :five 4
			      :six 5
			      :seven 6
			      :eight 7) out))
			      
#(136 198 250)

(loop for n across #(136 198 250)
	      do (format t "~8,'0b " n))
10001000 11000110 11111010 

This shows that in :little-endian byte order, the first field goes in the least significant bit of the least significant byte, and the last field goes in the most significant bit of the most significant byte. So it matches the specification because the one field is represented by the low-order bits and the eight field is represented by the high-order bits.

So when writing a DEFBINARY struct that splits bits out of little-endian integers, define the fields in the low-order bits first.

:big-endian byte order behaves differently.

;; FIXME: There is a bug in this definition. It is discussed below.
(defbinary eight-three (:byte-order :big-endian)
  (one 0 :type (unsigned-byte 3))
  (two 0 :type (unsigned-byte 3))
  (three 0 :type (unsigned-byte 3))
  (four 0 :type (unsigned-byte 3))
  (five 0 :type (unsigned-byte 3))
  (six 0 :type (unsigned-byte 3))
  (seven 0 :type (unsigned-byte 3))
  (eight 0 :type (unsigned-byte 3)))

Because the total number of bits is divisible by 8, Lisp-Binary will generate reader code that reads whole bytes and then parses them with shifting and masking— the same kind of code you might write manually. The writer code shifts and ORs bits together.

To see how the above encodes, you can use FLEXI-STREAMS (on which Lisp-Binary depends) to encode it into an array:

CL-USER> (flexi-streams:with-output-to-sequence (out :element-type '(unsigned-byte 8))
	   (lisp-binary:write-binary 
	    (make-eight-three :one 0
			      :two 1
			      :three 2
			      :four 3
			      :five 4
			      :six 5
			      :seven 6
			      :eight 7) out))
			      
#(5 57 119)

To make it easier to see the exact bit layout, print it in binary:

CL-USER> (loop for n across *
	      do (format t "~8,'0b " n))
00000101 00111001 01110111 
NIL

Here we see that the fields are in the wrong order compared to the specification. Bits 0..2 are the low-order bits of the 24-bit field, but the value with all zeroes is in the high-order position. Therefore, when you write DEFBINARY structs for fields such as these in :big-endian structs, you should write the high-order fields first:

;; Corrected version of EIGHT-THREE
(defbinary eight-three (:byte-order :big-endian)
  (eight 0 :type (unsigned-byte 3))
  (seven 0 :type (unsigned-byte 3))
  (six 0 :type (unsigned-byte 3))
  (five 0 :type (unsigned-byte 3))
  (four 0 :type (unsigned-byte 3))
  (three 0 :type (unsigned-byte 3))
  (two 0 :type (unsigned-byte 3))
  (one 0 :type (unsigned-byte 3)))

The bit pattern generated this time is:

11111010 11000110 10001000 

...which matches the specification.

Non-byte-aligned operation

If the total number of bits in your DEFBINARY struct is a multiple of 8, then Lisp-Binary will generate shifting-and-masking code and do its I/O in whole bytes.

But it's also legal to define DEFBINARY structs with bit counts that are not a multiple of 8. For instance:

CL-USER> (defbinary not-byte-aligned (:byte-order :big-endian)
            (some-bits 0 :type (unsigned-byte 4)))
NOT-BYTE-ALIGNED

While this is supported, it doesn't "just work" like byte-aligned structs do. It is necessary to wrap the stream in a special Gray stream that supports bit-by-bit I/O:

CL-USER> (flexi-streams:with-input-from-sequence (in #(#b10101111))
	   (with-wrapped-in-bit-stream (wrapped in :byte-order :big-endian)
	     (read-binary 'not-byte-aligned wrapped)))
#S(NOT-BYTE-ALIGNED :SOME-BITS 10)
1/2 ;; number of bytes read

The with-wrapped-in-bit-stream form requires a :byte-order parameter because the shifting and masking is happening in the stream instead of the code generated by the DEFBINARY macro.

Common Lisp doesn't provide a way for standard functions such as CL:READ-SEQUENCE to do sub-byte I/O. Therefore, the relevant functionality for doing this is found mostly in the Lisp-Binary:READ-BYTES and Lisp-Binary:WRITE-BYTES methods. The bit stream object gives us a place to keep the "unread" bits that are parts of bytes that in fact have been read from the underlying real stream.

The following code demonstrates reading one bit at a time using READ-BYTES:

CL-USER> (flexi-streams:with-input-from-sequence (in #(#b10101111))
	   (with-wrapped-in-bit-stream (wrapped in :byte-order :big-endian)
	     (loop repeat 4 collect (read-bytes 1/8 wrapped))))
(#(1) #(0) #(1) #(0))

The less-primitive READ-INTEGER and WRITE-INTEGER methods combine reading and writing with decoding and encoding:

CL-USER> (flexi-streams:with-input-from-sequence (in #(#b10101111))
	   (with-wrapped-in-bit-stream (wrapped in :byte-order :big-endian)
	     (loop repeat 2 collect (read-integer 1/4 wrapped))))
(2 2)

The more primitive BIT-FIELD type

The eight-three struct could be defined in this alternative manner:

;; Corrected version of EIGHT-THREE
(defbinary eight-three (:byte-order :big-endian)
  ((eight seven six five four three two one)
   0 :type (bit-field :raw-type (unsigned-byte 24)
                      :member-types ((unsigned-byte 3)
                                     (unsigned-byte 3)
                                     (unsigned-byte 3)
                                     (unsigned-byte 3)
                                     (unsigned-byte 3)
                                     (unsigned-byte 3)
                                     (unsigned-byte 3)
                                     (unsigned-byte 3)))))

The syntax presented at the beginning of this page is syntactic sugar for the bit-field type. The DEFBINARY macro groups sub-byte fields into bit-field members, generates a bit-field type descriptor, and then generates the reader and writer code.

The byte-alignedness of arrays

Lisp-Binary relies on a heuristic when determining whether an array requires bit-by-bit I/O to handle. If the array's element type has a number of bits that are divisible by 8, then the array's length is assumed to be divisible by eight, while if the element type has a number of bits that are not divisible by 8, then we assume that the array's total length is not divisible by 8. Lisp-Binary doesn't attempt to do grouping with arrays.

There are also rules for other things whose sizes might be expressed in non-whole-byte values. For example, the count integer of a counted-string and the length of the terminator in a terminated-string can be non-integer numbers of bytes long.

If a field is represented by another DEFBINARY struct, that struct's definition is consulted to decide if a bit-stream is required to read this one. The logic is significantly dumber than the logic that groups unsigned-byte and signed-byte fields together, so don't expect grouping to happen if, for example, you have two DEFBINARY structs that are 12 bits each. You'll need a bit-stream to read or write that.