Vest is a research project aiming for high-assurance and performant parsing and serialization of binary data formats in Verus. It features a library of formally verified binary parsers, serializers, and their combinators, as well as a domain-specific language (DSL) for expressing binary formats described in RFCs or other specifications.
Parsing and serialization of binary data
In the context of binary formats, parsing refers to the process of interpreting raw byte sequences as structured data, while serialization refers to the reverse process of encoding structured data as raw byte sequences. Binary formats are essential in domains like network protocols, file systems, and embedded systems, where data is often transmitted or stored in a compact binary form.
Formally verified parsing and serialization
Binary formats are notoriously difficult to parse and serialize correctly, due to the complexity of the formats and the potential for errors in the implementation. Vest aims to address this problem by formally verifying the correctness and security of the parsing and serialization code using the Rust type system and Verus, a deductive verification tool for Rust.
We don't use unsafe
so the Rust type system provides us with strong guarantees about the memory safety of the code. We use Verus to verify the more nuanced properties of the code, such as the top-level round-trip properties of the parsing and serialization functions.
- For every binary sequence
b
, ifparse(b)
succeeds, producing a result(n, m)
, thenserialize(m)
should reproduce the original inputb
, truncated ton
bytes. - For every structured data
m
, ifserialize(m)
succeeds, producing a binary sequenceb
, thenparse(b)
should successfully consuming the entire inputb
and produce the original structured datam
.
These round-trip properties ensure that the parsing and serialization functions are mutual inverses and hence immune to parser malleability attacks (EverParse) and format confusion attacks (Comparse).
Parser and serializer combinators
It's certainly possible to implement and verify parsers and serializers for single protocol formats or file formats manually, but this approach is tedious, and not reusable. Binary formats often share common patterns, such as fixed-size fields, variable-size fields, a sequence of fields, a tagged union of fields, etc. Vest provides a set of parser and serializer combinators that can be used to build complex parsers and serializers from simple ones, where the properties of the combinators are verified once and for all.
Vest DSL (implemented separately in the vest-dsl
crate) provides a domain-specific language (DSL) for expressing binary formats in a concise and readable way. The DSL is designed to be close to the syntax of Rust data type declarations, with added expressivity like type refinements, internal dependencies within formats, and external dependencies among different formats, enabling the user to define a variety of binary formats found in RFCs or other external specifications. The DSL is type checked and translated into a set of combinators defined and verified in the vest
crate. It's recommended to use the Vest DSL to define binary formats to avoid the boilerplate of manually constructing combinators, but it's also possible to use the combinators directly.
A .vest
file contains a set of format definitions, each of which defines a binary format using the Vest DSL and will be translated into a corresponding Rust data type and a pair of parsing and serialization functions. As a classic example, consider the following .vest
file defining a TLV data format:
// TLV format
tlv_msg = {
@t: msg_type,
@l: u16,
v: [u8; @l] >>= choose (@t) {
MsgType1 => type1_msg,
MsgType2 => type2_msg,
MsgType3 => type3_msg,
},
}
msg_type = enum {
MsgType1 = 0x01,
MsgType2 = 0x02,
MsgType3 = 0x03,
}
type1_msg = ...
type2_msg = ...
type3_msg = ...
The .vest
file defines a tlv_msg
format, which consists of a message type t
, a length l
, and a value v
(the @
prefix means that those are dependent fields and can be referenced later). The value v
is a byte sequence of length l
, and the message type t
determines how the value is parsed. msg_type
defines an enumeration of message types, and the choose
combinator is used to select the appropriate message format based on the formerly parsed message type t
(must be an enum
). The type1_msg
, type2_msg
, and type3_msg
formats define the specific message formats for each message type. Roughly, this .vest
file would correspond to the following Rust data types and functions:
struct TlvMsg {
t: MsgType,
l: u16,
v: TlvMsgV,
}
type MsgType = u8;
enum TlvMsgV {
MsgType1(Type1Msg),
MsgType2(Type2Msg),
MsgType3(Type3Msg),
}
struct Type1Msg { ... }
struct Type2Msg { ... }
struct Type3Msg { ... }
fn tlv_msg() -> TlvMsgCombinator {
Mapped { inner: Depend((U8, U16), |(t, l)| tlv_msg_v(t, l)), mapper: TlvMsgMapper }
}
fn tlv_msg_v(t: MsgType, l: u16) -> TlvMsgVCombinator {
AndThen(
Bytes(l as usize),
Mapped {
inner: OrdChoice(
OrdChoice(
Cond { lhs: t, rhs: 1, inner: type1_msg() },
Cond { lhs: t, rhs: 2, inner: type2_msg() },
),
Cond { lhs: t, rhs: 3, inner: type3_msg() },
),
mapper: TlvMsgVMapper,
},
)
}
fn type1_msg() -> Type1MsgCombinator { ... }
fn type2_msg() -> Type2MsgCombinator { ... }
fn type3_msg() -> Type3MsgCombinator { ... }
fn parse_tlv_msg(i: &[u8]) -> (o: Result<(usize, TlvMsg), ()>) {
tlv_msg().parse(i)
}
fn serialize_tlv_msg(v: TlvMsg, data: &mut [u8], pos: usize) -> (o: Result<usize, ()>) {
tlv_msg().serialize(v, data, pos)
}
// spec and proof code (no manual verification needed)
The following table briefly summarizes the correspondence between the Vest DSL format definitions and the generated Rust data types and combinators:
Vest DSL | Rust Data Type | Rust Combinator |
---|---|---|
msg_name = u8 |
type MsgName = u8 |
U8 |
msg_name = u16 |
type MsgName = u16 |
U16 |
msg_name = u32 |
type MsgName = u32 |
U32 |
msg_name = u64 |
type MsgName = u64 |
U64 |
msg_name = u8 | {32} |
type MsgName = u8 |
Refined { inner: U8, predicate: U8Is32 } |
msg_name = enum { A = 1, B = 2, ... } |
type MsgName = u8 |
U8 |
msg_name = enum { A = 3100, B = 3101, ... } |
type MsgName = u16 |
U16 |
msg_name = [u8; 16] |
type MsgName = &[u8] |
BytesN::<16> |
msg_name(@l) = [u8; @l] |
type MsgName = &[u8] |
Bytes(l as usize) |
msg_name(@l) = [u8; @l] >>= msg_a |
type MsgName = MsgA |
Bytes(l as usize).and_then(msg_a()) |
msg_name = { a: msg_a, b: msg_b, ... } |
struct MsgName { a: MsaA, b: MsaB, ... } |
Mapped { inner: ((msg_a(), msg_b(), ...), ...), mapper: MsgNameMapper } |
msg_name = { a: msg_a, b: Tail } |
struct MsgName { a: MsaA, b: &[u8] } |
Mapped { inner: (msg_a(),Tail), mapper: MsgNameMapper } |
msg_name = { @l: u16, b: [u8; @l] } |
struct MsgName { l: u16, b: &[u8] } |
Mapped { inner: Depend(U16, |l: u16| Bytes(l as usize)), mapper: MsgNameMapper } |
msg_name(@t: msg_type) = choose (@t) { A => msg_a, B => msg_b, ... } |
enum MsgName { A(MsgA), B(MsgB), ... } |
Mapped { inner: OrdChoice(OrdChoice(msg_a(), msg_b()), ...), mapper: MsgNameMapper } |
The xxxMapper
s are automatically generated by the Vest DSL compiler and are used to convert between the structural representation of the format (nested products or nested sums) and the nominal Rust data types (struct
s and enum
s).
To enable syntax highlighting for .vest
files in vim/neovim, paste the vest.vim file in your ~/.vim/syntax/
or ~/.config/nvim/syntax/
directory and add the following line to your ~/.vimrc
or ~/.config/nvim/init.vim
file:
au BufNewFile,BufRead *.vest setfiletype vest
In case the user wants more control over the parsing and serialization process, they can use the combinators directly.
use vest::regular::bytes::Bytes;
let pair_of_bytes = (Bytes(1), Bytes(2));
let input = &[0x10; 10];
let (consumed, (a, b)) = pair_of_bytes.parse(input)?;
let mut output = vec![0x00; 40];
let written = pair_of_bytes.serialize((a, b), &mut output, 0)?;
proof { pair_of_bytes.theorem_parse_serialize_roundtrip(input@); }
assert(written == consumed);
assert(&output[..written]@, &input[..written]@);
use vest::regular::uints::U8;
use vest::regular::refined::{Refined, Pred};
pub struct EvenU8;
impl Pred for EvenU8 {
type Input<'a> = u8;
fn apply(&self, i: &Self::Input<'_>) -> bool {
*i % 2 == 0
}
}
let even_u8 = Refined { inner: U8, predicate: EvenU8 };
let mut output = vec![0x00; 40];
let ten = 10u8;
let written = even_u8.serialize(ten, &mut output, 0)?;
let (consumed, parsed) = even_u8.parse(output.as_slice())?;
proof { even_u8.theorem_serialize_parse_roundtrip(ten@); }
assert(written == consumed);
assert(parsed@, ten@);
Make sure you have Rust and Verus properly installed. Then, clone the repository and run:
- To verify and compile the entire
vest
crate:
cd vest
make
- To verify the examples:
cd vest-examples
make
- To use the Vest DSL:
cd vest-dsl
cargo run path/to/your/file.vest
- Or you can build the
vest-dsl
crate and use the binary directly:
cd vest-dsl
cargo build --release
./target/release/vest-dsl --help
Usage: vest-dsl [OPTIONS] <VEST_FILE>
Arguments:
<VEST_FILE> Name or directory of the vest file
Options:
-o, --output <OUTPUT> Name of the output verus file
-h, --help Print help
-V, --version Print version
Vest is still in the early stages of development, and we welcome contributions from the community to either the core library or the DSL. We are also looking for feedback on the design, usability, and performance of the tool. If you are interested in contributing, please feel free to open an issue or a pull request.