Lesson 2: The Internet Protocol (version 4)

In the previous lesson, we developed a Parsley specification for the User Datagram Protocol. In this lesson, we continue looking at the IP suite with Internet Protocol (version 4). We develop a Parsley specification based on IETF RFC 791.

The `include` Declaration

First, we note that, as is the case with many protocols, we cannot parse IPv4 in isolation. IPv4 packets may contain a UDP packet, a TCP packet, other packets, or even fragments of packets. Moreover, correctly parsing IPv4 and UDP packets requires the parsers to exchange information, e.g., for checksum computation.

Note	For the time being, we do not ensure the correctness of checksums, but we will do so in a following lesson.

Bridging this parser-parser barrier is done via the include declaration. The include declaration is followed by a filename of a file within the same directory. It enables one to include all declarations within the specified file.

include udp

Here, we include udp.ply.

The IPv4 Header

We now examine the structure of the IPv4 header. We begin by declaring the ip_packet type:

bitfield ipv4_flags = {
 bit0: 2,
 bit1: 1,
 bit2: 0
}

type ip_packet = {
  version: u8,
  ihl: u8,
  type_of_service: u8,
  total_length: u16,
  identification: [byte],
  flags: ipv4_flags,
  fragment_offset: u16,
  time_to_live: u8,
  protocol: u8,
  header_checksum: u16,
  source_address: [byte],
  destination_address: [byte],
  options_and_padding: [byte]
}

This type declaration closely resembles Figure 4 of the Internet Protocol specification. The process of mapping the specification header structure to a Parsley type should be familiar. After all, we performed a similar task in declaring the udp_packet in the the first lesson.

That said, there are a few new things to notice:

First, we not only see unsigned 16-bit integers (u16) but also 8-bit unsigned integers (u8).
Second, we see the user-defined bitfield ipv4_flags. Bitfields can be thought of as special user-defined types for handling bits.
Third, we see ipv4_flags appear as a field within the user-defined type ip_packet. While working on the Parsley specification for UDP, we had only dealt with atomic types and lists of atomic types (byte lists).

Writing the Format Block

As we have done before, we use the format block to specify how to parse an IPv4 packet.

format {
  IPPacket ipp {ip_packet} :=
    (| version: u8 := 0u8, ihl: u8 := 0u8, fragment_offset: u16 := 0u16 |)
    version_bv = BitVector< 4 >
    ihl_bv = BitVector< 4 >
    {
      version := Bits.to_u8(version_bv);
      ihl := Bits.to_u8(ihl_bv)
    }
    [version = 4u8]
    [ihl >=_u8 5u8]  // the header length should be at least 5 as per the spec
    type_of_service = UInt8
    total_length = UInt16<endian=endian::Big()>
    identification = (Byte^2u)
    flags = $bitfield(ipv4_flags)
    [flags.bit0 = 0b0] // the specification specifies this should be 0
    fragments_offset_bv = BitVector< 13 >
    {
      fragment_offset := Bits.to_u16(fragments_offset_bv)
    }
    time_to_live = UInt8
    protocol = UInt8
    header_checksum = UInt16<endian=endian::Big()>
    source_address = (Byte^4u)
    destination_address = (Byte^4u)
    options_and_padding_len = {;;
      4u16 *_u16 U16.of_u8(ihl) -_u16 5u16
    }
    options_and_padding = (Byte^Usize.of_u16(options_and_padding_len))
    {
      ipp.version := version;
      ipp.ihl := ihl;
      ipp.type_of_service := type_of_service;
      ipp.total_length := total_length;
      ipp.identification := identification;
      ipp.flags := flags;
      ipp.fragment_offset := fragment_offset;
      ipp.time_to_live := time_to_live;
      ipp.protocol := protocol;
      ipp.header_checksum := header_checksum;
      ipp.source_address := source_address;
      ipp.destination_address := destination_address;
      ipp.options_and_padding := options_and_padding
    }

    // udp w/o fragmentation
    ([protocol = 17u8 && flags.bit1 = 0b1] UDPPacket) | // udp w/o fragmentation

    // placeholder for udp w/ fragmentation or other next level protocols,
    // e.g., protocol 6 is tcp
    [protocol != 17u8 || (protocol = 17u8 && flags.bit1 = 0b0)]
}

There’s a lot in here! And while much of it should be clear, there is also much that is new, which requires explanation.

Temporaries

First, let’s look at the beginning of the format declaration:

format {
  IPPacket ipp {ip_packet} :=
    (| version: u8 := 0u8, ihl: u8 := 0u8, fragment_offset: u16 := 0u16 |)
    ...
}

The left-hand side of the rule declaration should be clear enough; we are creating a non-terminal named IPPacket with short name ipp that has the attribute ip_packet. But what is going on afterward?

Well, at the beginning of the right-hand side of a rule declaration, temporaries may be specified. These temporaries are declared between (| and |) markers and remain in scope for the remainder of the rule. In our example, we declare three temporaries that have the types u8, u8, and u16 respectively, all initialized to the 0 value for their corresponding integer types. We later assign values to these temporaries via action blocks. In fact, we do so very soon.

Bits and Action Blocks

Let’s look at the next few lines.

format {
  IPPacket ipp {ip_packet} :=
    (| version: u8 := 0u8, ihl: u8 := 0u8, fragment_offset: u16 := 0u16 |)
    version_bv = BitVector< 4 >
    ihl_bv = BitVector< 4 >
    {
      version := Bits.to_u8(version_bv);
      ihl := Bits.to_u8(ihl_bv)
    }
    [version = 4u8]
    [ihl >=_u8 5u8]  // the header length should be at least 5 as per the spec
    ...
}

The identifiers version_bv are matched bit vectors of length 4. We then apply the Bits.to_u8() function from the standard library to extract u8 values from these bit vectors. In fact, this form should look quite familiar from the previous lesson where we assigned values to synthesized attributes. The only difference here is that we are assigning values to temporaries instead of assigning values to synthesized attributes. Next, we have two constraints to ensure that we are dealing with version 4 of IP and that the IHL value is at least 5, in accordance with the IPv4 specification.

While much of the remainder of the file is straightforward, it is instructive to look at how bitfields are used and to examine the remaining action blocks and constraints.

One task involves extracting 3 flag bits, followed by a 13-bit fragments_offset length value from a 2-byte segment of the packet. This is achieved by the code below.

format {
  IPPacket ipp {ip_packet} :=
    ...
    flags = $bitfield(ipv4_flags)
    [flags.bit0 = 0b0] // the specification specifies this should be 0
    fragments_offset_bv = BitVector< 13 >
    {
      fragment_offset := Bits.to_u16(fragments_offset_bv)
    }
    ...
}

The identifier flags matches an instantiation of the user-defined bitfield ipv4_flags, that is set by reading 3 bits via $bitfield(ipv4_flags). Recall that ipv4_flags was a bitfield containing 3 bits, which we cleverly named bit0, bit1, and bit2. In accordance with the protocol, we ensure that bit0 is set to zero in the following line. We then match fragments_offset_bv to a 13-bit bit vector. Finally, in an action block, we use Bits.to_u16(fragments_offset_bv) to obtain a u16 value corresponding to this bit vector, and we store the result in the `fragment_offset temporary that we introduced earlier.

The next piece of code demonstrates the use of action blocks to compute and return values, storing them within an identifier.

format {
  IPPacket ipp {ip_packet} :=
    ...
    options_and_padding_len = {;;
      4u16 *_u16 U16.of_u8(ihl) -_u16 5u16
    }
    options_and_padding = (Byte^Usize.of_u16(options_and_padding_len))
    ...
}

Here, we convert ihl to a u16 value and do some simple arithmetic to determine the length of the portion of the packet dealing with options and padding. Any value following ;; within an action block is returned. Here, the computed length value returned and stored in the identifier options_and_padding_len. This identifier is then used to match the options and padding values in the line following the action block.

Note	Instead of using an action block to compute `options_and_padding_len`, we could have alternatively computed the expression corresponding to the options and paddling length directly within the `options_and_padding` assignment itself. We used the approach above for pedagogical purposes.

The Ordered Choice Rule Element Combinator, Boolean Operators, and the `include` Declaration at Play

The tail of the format declaration is as follows:

format {
  IPPacket ipp {ip_packet} :=
    ...
    // udp w/o fragmentation
    ([protocol = 17u8 && flags.bit1 = 0b1] UDPPacket) | // udp w/o fragmentation

    // placeholder for udp w/ fragmentation or other next level protocols,
    // e.g., protocol 6 is tcp
    [protocol != 17u8 || (protocol = 17u8 && flags.bit1 = 0b0)]
}

Here, we split the parse into two parts: (i) a part we deal with that corresponds to IPv4 over UDP without fragmentation and (ii) a part we ignore, which corresponds to everything else. This is achived by the ordered choice rule element combinator (|) that matches the first rule element that is satisfied.

Let’s examine the two operands. The first operand is: ([protocol = 17u8 && flags.bit1 = 0b1] UDPPacket). This has two parts. First, we have a constraint that ensures that we are dealing with a UDP packet that does not use fragmentation by ensuring that the protocol matches that of UDP (protocol = 17u8) and that the bit indicating that fragmentation should not take place is set (flags.bit1 = 0b1); the and operator (&&) ensures this constraint is satisfied only if both conditions are met. Second, we invoke the non-terminal UDPPacket from udp.ply, which we had included earlier; if this fails, the parse should be rejected. The surrounding parentheses ensure that this operand is treated as a unit within the context of the rule element combinator (|).

The second operand to | is the constraint [protocol != 17u8 || (protocol = 17u8 && flags.bit1 = 0b0)]. This constraint utilizes the Boolean or operator (||) to ensure that we are either not dealing with the UDP protocol or that we are dealing with the UDP protocol but we have to handle fragmentation. The first part of the constraint, protocol != 17u8 succeeds if the protocol does not equal 17u8, which is the code for UDP. The second part of the constraint uses the Boolean and operator (&&) to ensure that the protocol is UDP (protocol = 17u8) and the corresponding bit indicating that fragmention should not occur is unset. Parentheses enforce the desired order of operations.

In a future lesson, we will demonstrate how these two parsers interact to check checksums. This will also require passing along information to the UDPPacket non-terminal via inherited attributes. But let’s worry about that later.

Our Final File

We have the final file:

include udp

bitfield ipv4_flags = {
 bit0: 2,
 bit1: 1,
 bit2: 0
}

type ip_packet = {
  version: u8,
  ihl: u8,
  type_of_service: u8,
  total_length: u16,
  identification: [byte],
  flags: ipv4_flags,
  fragment_offset: u16,
  time_to_live: u8,
  protocol: u8,
  header_checksum: u16,
  source_address: [byte],
  destination_address: [byte],
  options_and_padding: [byte]
}

format {
  IPPacket ipp {ip_packet} :=
    (| version: u8 := 0u8, ihl: u8 := 0u8, fragment_offset: u16 := 0u16 |)
    version_bv = BitVector< 4 >
    ihl_bv = BitVector< 4 >
    {
      version := Bits.to_u8(version_bv);
      ihl := Bits.to_u8(ihl_bv)
    }
    [version = 4u8]
    [ihl >=_u8 5u8]  // the header length should be at least 5 as per the spec
    type_of_service = UInt8
    total_length = UInt16<endian=endian::Big()>
    identification = (Byte^2u)
    flags = $bitfield(ipv4_flags)
    [flags.bit0 = 0b0] // the specification specifies this should be 0
    fragments_offset_bv = BitVector< 13 >
    {
      fragment_offset := Bits.to_u16(fragments_offset_bv)
    }
    time_to_live = UInt8
    protocol = UInt8
    header_checksum = UInt16<endian=endian::Big()>
    source_address = (Byte^4u)
    destination_address = (Byte^4u)
    options_and_padding_len = {;;
      4u16 *_u16 U16.of_u8(ihl) -_u16 5u16
    }
    options_and_padding = (Byte^Usize.of_u16(options_and_padding_len))
    {
      ipp.version := version;
      ipp.ihl := ihl;
      ipp.type_of_service := type_of_service;
      ipp.total_length := total_length;
      ipp.identification := identification;
      ipp.flags := flags;
      ipp.fragment_offset := fragment_offset;
      ipp.time_to_live := time_to_live;
      ipp.protocol := protocol;
      ipp.header_checksum := header_checksum;
      ipp.source_address := source_address;
      ipp.destination_address := destination_address;
      ipp.options_and_padding := options_and_padding
    }

    // udp w/o fragmentation
    ([protocol = 17u8 && flags.bit1 = 0b1] UDPPacket) | // udp w/o fragmentation

    // placeholder for udp w/ fragmentation or other next level protocols,
    // e.g., protocol 6 is tcp
    [protocol != 17u8 || (protocol = 17u8 && flags.bit1 = 0b0)]

Navigation: ↑ Tutorial Overview | ← Previous Lesson | → Next Lesson | 📄 Documentation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ipv4.adoc

ipv4.adoc

Lesson 2: The Internet Protocol (version 4)

The `include` Declaration

The IPv4 Header

Writing the Format Block

Temporaries

Bits and Action Blocks

The Ordered Choice Rule Element Combinator, Boolean Operators, and the `include` Declaration at Play

Our Final File

Files

ipv4.adoc

Latest commit

History

ipv4.adoc

File metadata and controls

Lesson 2: The Internet Protocol (version 4)

The include Declaration

The IPv4 Header

Writing the Format Block

Temporaries

Bits and Action Blocks

The Ordered Choice Rule Element Combinator, Boolean Operators, and the include Declaration at Play

Our Final File

The `include` Declaration

The Ordered Choice Rule Element Combinator, Boolean Operators, and the `include` Declaration at Play