In the previous lesson, we developed a Parsley specification for the User Datagram Protocol. In this lesson, we continue looking at the IP suite with Internet Protocol (version 4). We develop a Parsley specification based on IETF RFC 791.
First, we note that, as is the case with many protocols, we cannot parse IPv4 in isolation. IPv4 packets may contain a UDP packet, a TCP packet, other packets, or even fragments of packets. Moreover, correctly parsing IPv4 and UDP packets requires the parsers to exchange information, e.g., for checksum computation.
Note
|
For the time being, we do not ensure the correctness of checksums, but we will do so in a following lesson. |
Bridging this parser-parser barrier is done via the include
declaration.
The include
declaration is followed by a filename of a file within the same directory.
It enables one to include all declarations within the specified file.
include udp
Here, we include udp.ply
.
We now examine the structure of the IPv4 header.
We begin by declaring the ip_packet
type:
bitfield ipv4_flags = { bit0: 2, bit1: 1, bit2: 0 } type ip_packet = { version: u8, ihl: u8, type_of_service: u8, total_length: u16, identification: [byte], flags: ipv4_flags, fragment_offset: u16, time_to_live: u8, protocol: u8, header_checksum: u16, source_address: [byte], destination_address: [byte], options_and_padding: [byte] }
This type declaration closely resembles Figure 4 of the Internet Protocol specification.
The process of mapping the specification header structure to a Parsley type should be familiar.
After all, we performed a similar task in declaring the udp_packet
in the the first lesson.
That said, there are a few new things to notice:
-
First, we not only see unsigned 16-bit integers (
u16
) but also 8-bit unsigned integers (u8
). -
Second, we see the user-defined bitfield
ipv4_flags
. Bitfields can be thought of as special user-defined types for handling bits. -
Third, we see
ipv4_flags
appear as a field within the user-defined typeip_packet
. While working on the Parsley specification for UDP, we had only dealt with atomic types and lists of atomic types (byte lists).
As we have done before, we use the format block to specify how to parse an IPv4 packet.
format { IPPacket ipp {ip_packet} := (| version: u8 := 0u8, ihl: u8 := 0u8, fragment_offset: u16 := 0u16 |) version_bv = BitVector< 4 > ihl_bv = BitVector< 4 > { version := Bits.to_u8(version_bv); ihl := Bits.to_u8(ihl_bv) } [version = 4u8] [ihl >=_u8 5u8] // the header length should be at least 5 as per the spec type_of_service = UInt8 total_length = UInt16<endian=endian::Big()> identification = (Byte^2u) flags = $bitfield(ipv4_flags) [flags.bit0 = 0b0] // the specification specifies this should be 0 fragments_offset_bv = BitVector< 13 > { fragment_offset := Bits.to_u16(fragments_offset_bv) } time_to_live = UInt8 protocol = UInt8 header_checksum = UInt16<endian=endian::Big()> source_address = (Byte^4u) destination_address = (Byte^4u) options_and_padding_len = {;; 4u16 *_u16 U16.of_u8(ihl) -_u16 5u16 } options_and_padding = (Byte^Usize.of_u16(options_and_padding_len)) { ipp.version := version; ipp.ihl := ihl; ipp.type_of_service := type_of_service; ipp.total_length := total_length; ipp.identification := identification; ipp.flags := flags; ipp.fragment_offset := fragment_offset; ipp.time_to_live := time_to_live; ipp.protocol := protocol; ipp.header_checksum := header_checksum; ipp.source_address := source_address; ipp.destination_address := destination_address; ipp.options_and_padding := options_and_padding } // udp w/o fragmentation ([protocol = 17u8 && flags.bit1 = 0b1] UDPPacket) | // udp w/o fragmentation // placeholder for udp w/ fragmentation or other next level protocols, // e.g., protocol 6 is tcp [protocol != 17u8 || (protocol = 17u8 && flags.bit1 = 0b0)] }
There’s a lot in here! And while much of it should be clear, there is also much that is new, which requires explanation.
First, let’s look at the beginning of the format declaration:
format { IPPacket ipp {ip_packet} := (| version: u8 := 0u8, ihl: u8 := 0u8, fragment_offset: u16 := 0u16 |) ... }
The left-hand side of the rule declaration should be clear enough; we are creating a non-terminal named IPPacket
with short name ipp
that has the attribute ip_packet
.
But what is going on afterward?
Well, at the beginning of the right-hand side of a rule declaration, temporaries may be specified.
These temporaries are declared between (|
and |)
markers and remain in scope for the remainder of the rule.
In our example, we declare three temporaries that have the types u8
, u8
, and u16
respectively, all initialized to the 0 value for their corresponding integer types.
We later assign values to these temporaries via action blocks.
In fact, we do so very soon.
Let’s look at the next few lines.
format { IPPacket ipp {ip_packet} := (| version: u8 := 0u8, ihl: u8 := 0u8, fragment_offset: u16 := 0u16 |) version_bv = BitVector< 4 > ihl_bv = BitVector< 4 > { version := Bits.to_u8(version_bv); ihl := Bits.to_u8(ihl_bv) } [version = 4u8] [ihl >=_u8 5u8] // the header length should be at least 5 as per the spec ... }
The identifiers version_bv
are matched bit vectors of length 4.
We then apply the Bits.to_u8()
function from the standard library to extract u8
values from these bit vectors.
In fact, this form should look quite familiar from the previous lesson where we assigned values to synthesized attributes.
The only difference here is that we are assigning values to temporaries instead of assigning values to synthesized attributes.
Next, we have two constraints to ensure that we are dealing with version 4 of IP and that the IHL value is at least 5, in accordance with the IPv4 specification.
While much of the remainder of the file is straightforward, it is instructive to look at how bitfields are used and to examine the remaining action blocks and constraints.
One task involves extracting 3 flag bits, followed by a 13-bit fragments_offset length value from a 2-byte segment of the packet. This is achieved by the code below.
format { IPPacket ipp {ip_packet} := ... flags = $bitfield(ipv4_flags) [flags.bit0 = 0b0] // the specification specifies this should be 0 fragments_offset_bv = BitVector< 13 > { fragment_offset := Bits.to_u16(fragments_offset_bv) } ... }
The identifier flags
matches an instantiation of the user-defined bitfield ipv4_flags
, that is set by reading 3 bits via $bitfield(ipv4_flags)
.
Recall that ipv4_flags
was a bitfield containing 3 bits, which we cleverly named bit0
, bit1
, and bit2
.
In accordance with the protocol, we ensure that bit0
is set to zero in the following line.
We then match fragments_offset_bv
to a 13-bit bit vector.
Finally, in an action block, we use Bits.to_u16(fragments_offset_bv)
to obtain a u16 value corresponding to this bit vector, and we store the result in the `fragment_offset
temporary that we introduced earlier.
The next piece of code demonstrates the use of action blocks to compute and return values, storing them within an identifier.
format { IPPacket ipp {ip_packet} := ... options_and_padding_len = {;; 4u16 *_u16 U16.of_u8(ihl) -_u16 5u16 } options_and_padding = (Byte^Usize.of_u16(options_and_padding_len)) ... }
Here, we convert ihl to a u16
value and do some simple arithmetic to determine the length of the portion of the packet dealing with options and padding.
Any value following ;;
within an action block is returned.
Here, the computed length value returned and stored in the identifier options_and_padding_len
.
This identifier is then used to match the options and padding values in the line following the action block.
Note
|
Instead of using an action block to compute options_and_padding_len , we could have alternatively computed the expression corresponding to the options and paddling length directly within the options_and_padding assignment itself. We used the approach above for pedagogical purposes.
|
The tail of the format declaration is as follows:
format { IPPacket ipp {ip_packet} := ... // udp w/o fragmentation ([protocol = 17u8 && flags.bit1 = 0b1] UDPPacket) | // udp w/o fragmentation // placeholder for udp w/ fragmentation or other next level protocols, // e.g., protocol 6 is tcp [protocol != 17u8 || (protocol = 17u8 && flags.bit1 = 0b0)] }
Here, we split the parse into two parts: (i) a part we deal with that corresponds to IPv4 over UDP without fragmentation and (ii) a part we ignore, which corresponds to everything else.
This is achived by the ordered choice rule element combinator (|
) that matches the first rule element that is satisfied.
Let’s examine the two operands. The first operand is: ([protocol = 17u8 && flags.bit1 = 0b1] UDPPacket)
. This has two parts. First, we have a constraint that ensures that we are dealing with a UDP packet that does not use fragmentation by ensuring that the protocol matches that of UDP (protocol = 17u8
) and that the bit indicating that fragmentation should not take place is set (flags.bit1 = 0b1
); the and operator (&&
) ensures this constraint is satisfied only if both conditions are met. Second, we invoke the non-terminal UDPPacket
from udp.ply
, which we had included earlier; if this fails, the parse should be rejected. The surrounding parentheses ensure that this operand is treated as a unit within the context of the rule element combinator (|
).
The second operand to |
is the constraint [protocol != 17u8 || (protocol = 17u8 && flags.bit1 = 0b0)]
. This constraint utilizes the Boolean or operator (||
) to ensure that we are either not dealing with the UDP protocol or that we are dealing with the UDP protocol but we have to handle fragmentation. The first part of the constraint, protocol != 17u8
succeeds if the protocol does not equal 17u8
, which is the code for UDP. The second part of the constraint uses the Boolean and operator (&&
) to ensure that the protocol is UDP (protocol = 17u8
) and the corresponding bit indicating that fragmention should not occur is unset. Parentheses enforce the desired order of operations.
In a future lesson, we will demonstrate how these two parsers interact to check checksums.
This will also require passing along information to the UDPPacket
non-terminal via inherited attributes.
But let’s worry about that later.
We have the final file:
include udp bitfield ipv4_flags = { bit0: 2, bit1: 1, bit2: 0 } type ip_packet = { version: u8, ihl: u8, type_of_service: u8, total_length: u16, identification: [byte], flags: ipv4_flags, fragment_offset: u16, time_to_live: u8, protocol: u8, header_checksum: u16, source_address: [byte], destination_address: [byte], options_and_padding: [byte] } format { IPPacket ipp {ip_packet} := (| version: u8 := 0u8, ihl: u8 := 0u8, fragment_offset: u16 := 0u16 |) version_bv = BitVector< 4 > ihl_bv = BitVector< 4 > { version := Bits.to_u8(version_bv); ihl := Bits.to_u8(ihl_bv) } [version = 4u8] [ihl >=_u8 5u8] // the header length should be at least 5 as per the spec type_of_service = UInt8 total_length = UInt16<endian=endian::Big()> identification = (Byte^2u) flags = $bitfield(ipv4_flags) [flags.bit0 = 0b0] // the specification specifies this should be 0 fragments_offset_bv = BitVector< 13 > { fragment_offset := Bits.to_u16(fragments_offset_bv) } time_to_live = UInt8 protocol = UInt8 header_checksum = UInt16<endian=endian::Big()> source_address = (Byte^4u) destination_address = (Byte^4u) options_and_padding_len = {;; 4u16 *_u16 U16.of_u8(ihl) -_u16 5u16 } options_and_padding = (Byte^Usize.of_u16(options_and_padding_len)) { ipp.version := version; ipp.ihl := ihl; ipp.type_of_service := type_of_service; ipp.total_length := total_length; ipp.identification := identification; ipp.flags := flags; ipp.fragment_offset := fragment_offset; ipp.time_to_live := time_to_live; ipp.protocol := protocol; ipp.header_checksum := header_checksum; ipp.source_address := source_address; ipp.destination_address := destination_address; ipp.options_and_padding := options_and_padding } // udp w/o fragmentation ([protocol = 17u8 && flags.bit1 = 0b1] UDPPacket) | // udp w/o fragmentation // placeholder for udp w/ fragmentation or other next level protocols, // e.g., protocol 6 is tcp [protocol != 17u8 || (protocol = 17u8 && flags.bit1 = 0b0)]
Navigation: ↑ Tutorial Overview | ← Previous Lesson | → Next Lesson | 📄 Documentation