Skip to content

Latest commit

 

History

History
453 lines (337 loc) · 9.65 KB

stdlib.adoc

File metadata and controls

453 lines (337 loc) · 9.65 KB

The Parsley Standard Library

The standard library provides simple utilities and data structures for the expression sub-language, and primitive non-terminals for the grammar sub-language.

Note
The standard library is under rapid evolution.

Expression sub-language

Table 1. Built-in variant types
Type Constructors

option

option::None(), option::Some(.)

bool

bool::True(), bool::False()

endian

endian::Big(), endian::Little()

bit

bit::One(), bit::Zero()

The integer operators are suffixed by the type <int_type> of their operands and result type. <int_type> can be either i8, u8, i16, u16, i32, u32, i64, u64, u, or i, where u and i denote usize and isize respectively.

Table 2. Built-in operators
Symbol Signature Description

-_<int_type>

<int_type> -> <int_type>

Unary minus

!

bool → bool

Boolean negation

~

bitvector<'n> -> bitvector<'n>

Bitwise complement

~_<int_type>

<int_type> -> <int_type>

Unary integer bitwise complement

+_<int_type>, -_<int_type>, *_<int_type>, /_<int_type>

<int_type> -> <int_type> -> <int_type>

Binary integer arithmetic operators (add, subtract, multiply, divide)

&_<int_type>, |_<int_type>, ^_<int_type>

<int_type> -> <int_type> -> <int_type>

Binary integer bitwise operators (and, or, xor)

<<_<int_type>, >>_<int_type>, >>_a<int_type>

<int_type> -> u8 -> <int_type>

Binary integer bitwise shift operators (left shift, logical right shift, arithmetic right shift)

<_<int_type>, <=_<int_type>, >_<int_type>, >=_<int_type>

<int_type> -> <int_type> -> bool

Integer comparisons (less, less or equal, greater, greater or equal)

&&, ||

bool → bool → bool

Binary boolean operators (and, or)

|_b, &_b

bitvector<'n> → bitvector<'n> → bitvector<'n>

Binary bitwise operators (or, and)

=, !=

'a → 'a → bool

Operators to check equality (equal, not equal)

[]

list<'a> → usize → option<'a>

l[n] extracts the n th element of list l if present

The following modules provide useful operations for the built-in types.

There is a module for each integer type, named after the type with the initial letter capitalized, e.g. I8 and Usize for the i8 and Usize types respectively. result<t> denotes t when the conversion from source_int_type to <t> is possible without loss of precision (i.e. the numerical integer value remains the same after conversion), and option<t> otherwise. The wrapping conversions always convert the source value to the result type with wrapping semantics; here, the numerical integer result value may differ from the source value. A module does not contain a wrapping converter for a source type that always converts safely (e.g. I16.of_u8_wrapped does not exist, since it is always possible to safely convert a u8 into a i16 without change in numerical value.)

Table 3. <Int_type>
Value Signature Description

of_byte

byte → <int_type>

integer value of a byte

of_string

string -> option<int_type>

safe string to integer conversion

of_string_unsafe

string -> <int_type>

string to integer conversion; throws an exception if conversion fails

of_bytes

[byte] -> option<int_type>

safe byte vector to integer conversion

of_bytes_unsafe

[byte] → <int_type>

byte vector to integer conversion; throws an exception if conversion fails

of_<source_int_type>

<source_int_type> -> result<int_type>

safe integer type conversion

of_<source_int_type>_wrapped

<source_int_type> -> <int_type>

wrapping integer type conversion

Table 4. Byte
Value Signature Description

of_<int_type>

<int_type> -> option<byte>

safe integer to byte conversion

of_<int_type>_wrapped

<int_type> -> byte

integer to byte conversion with wrapping semantics

Table 5. Bits
Value Signature Description

to_<int_type>

bitvector<'n> → <int_type>

integer value of a bitvector (Bytes are in big-endian order; i.e. the last or rightmost eight bits in the bitvector form the least-significant byte.)

to_<int_type>_endian

endian → bitvector<'n> → <int_type>

integer value of a bitvector with bytes in specified endianness (This is equivalent to to_<int_type> when endian is specified as endian::Big(). When endian is endian::Little(), the first or leftmost eight bits form the least-significant byte.)

to_bool

bit → bool

boolean value of a bit

of_bool

bool → bit

bit value of a boolean

to_bit

bitvector<1> → bit

extract the bit from a 1-bitvector

of_bit

bit → bitvector<1>

put a bit into a 1-bitvector

ones

usize → bitvector<n>

ones(n) generates a bitvector<n> with all bits set

zeros

usize → bitvector<n>

zeros(n) generates a bitvector<n> with all bits clear

Table 6. List
Value Signature Description

head

['a] → 'a

head element of list; throws an exception if list is empty

tail

['a] → ['a]

tail of list; throws an exception if list is empty

length

['a] → usize

list length

concat

['a] → ['a] → ['a]

list concatenation

flatten

[[a]] → [a]

list flattening

rev

['a] → ['a]

list reversal

map

('a → 'b) → ['a] → ['b]

mapping over lists

map2

('a → 'b → 'c) → ['a] → ['b] → ['c]

mapping over two lists; throws an exception if list lengths are not equal

fold

('a → 't → 'a) → 'a → ['t] → 'a

folds an accumulator over a list

repl

a → usize → ['a]

creates a list containing the specified number of copies of the specified element

Table 7. Set
Value Signature Description

empty

set<'a>

an empty set

add

set<'a> → 'a → set<'a>

add an element to a set

mem

set<'a> → 'a → bool

check set membership

Table 8. Map
Value Signature Description

empty

map<'k, 'v>

empty map

add

map<'k, 'v> → 'k → 'v → map<'k, 'v>

add a key-value binding

mem

map<'k, 'v> → 'k → bool

check if a binding exists for a key

find

map<'k, 'v> → 'k → option<'v>

return the binding for a key if present

find_unsafe

map<'k, 'v> → 'k → option<'v>

return the binding for a key; throws an exception if no binding exists

Table 9. String
Value Signature Description

empty

string

empty string

to_bytes

string → [byte]

string to byte list conversion

of_bytes

[byte] → option<string>

safe byte vector to string conversion

of_bytes_unsafe

[byte] → string

byte vector to string conversion; throws an exception if conversion fails

of_literal

string → string

converts a string literal into a string

Note
Character encoding issues for string conversion will be addressed soon.
Table 10. View
Value Signature Description

get_current

unit → view

gets the current view (i.e. parsing buffer)

get_base

unit → view

gets the current view with the cursor set at the beginning of the buffer

get_cursor

view → usize

gets the cursor offset in the specified view (a cursor at the start position has a zero offset)

get_remaining

view → usize

gets the remaining bytes in the specified view (i.e. the number of bytes from the cursor to the end of the view)

get_current_cursor

unit → usize

get the cursor offset in the current view

get_current_remaining

unit → usize

gets the remaining bytes in the current view

restrict

view → usize → usize → view

restrict(v, n, len) returns a view of size len that starts n bytes from the cursor of v; throws an exception the specified range is out-of-bounds

restrict_from

view → usize → view

restrict_from(v, n) returns a view that begins n bytes from the cursor location of v and continues until the end of v; throws an exception if n is out-of-bounds

clone

view → view

returns a copy of the view

Grammar sub-language

The library provides primitive non-terminals, their inherited attributes if any, and the types of their contents. The byte-valued non-terminals with an S suffix return byte lists, and hence compose with regular expression combinators. The names of the various *Int* integer non-terminals indicate signedness (a 'U' prefix implies unsigned), and bit-width (a NN suffix indicates the bit-width).

Table 11. Built-in non-terminals
Non-terminal Type Description

Byte

byte

Matches a single byte

AsciiChar

byte

Matches a single ASCII character

HexChar

byte

Matches a single hexadecimal character

AlphaNum

byte

Matches a single alphanumeric character

Digit

byte

Matches a single decimal numeric character

AsciiCharS

[byte]

Matches a single ASCII character

HexCharS

[byte]

Matches a single hexadecimal character

AlphaNumS

[byte]

Matches a single alphanumeric character

DigitS

[byte]

Matches a single decimal numeric character

Int8 (endian: endian)

i8

Matches a single byte

UInt8 (endian: endian)

u8

Matches a single byte

Int16 (endian: endian)

i16

Matches two bytes

UInt16 (endian: endian)

u16

Matches two bytes

Int32 (endian: endian)

i32

Matches four bytes

UInt32 (endian: endian)

u32

Matches four bytes

Int64 (endian: endian)

i64

Matches eight bytes

UInt64 (endian: endian)

u64

Matches eight bytes

BitVector<n>

bitvector<n>

Matches n bits