Skip to content

Latest commit

 

History

History
399 lines (319 loc) · 11.7 KB

CBOR_DNS_STREAM.md

File metadata and controls

399 lines (319 loc) · 11.7 KB

CBOR DNS Stream Format version 1 (CDSv1)

This is an experimental format for representing DNS information in CBOR with the goals to:

  • Be able to stream the information
  • Support incomplete, broken and/or invalid DNS
  • Have close to no data quality and signature degradation
  • Support additional non-DNS meta data (such as ICMP/TCP attributes)

Overview

In CBOR you are expected to have one root element, most likely an array or map. This format does not have a root element, instead you are expected to read one CBOR array element at a time as a stream of CBOR elements with the first array element being the stream initiator object.

[stream_init]
[message]
...
[message]

Here are some number on the compression rate compared to PCAP:

Uncompressed PCAP CDS Factor
client 458373 133640 0,2915
zonalizer 51769844 9450475 0,1825
large ditl 1003931674 298167709 0,2970
small ditl 1651252 603314 0,3653
Gzipped PCAP CDS Factor F/Uncompressed
client 108136 45944 0,4248 0,1002
zonalizer 12468329 2485620 0,1993 0,0480
large ditl 327227203 117569598 0,3592 0,1171
small ditl 539323 253402 0,4698 0,1534
Xzipped PCAP CDS Factor F/Uncompressed
client 76248 36308 0,4761 0,0792
zonalizer 7894356 1695920 0,2148 0,0327
large ditl 267031412 86747604 0,3248 0,0864
small ditl 442260 206596 0,4671 0,1251
  • client is a couple of hours of DNS from my workstation
  • zonalizer is half a day from Zonalizer which continuously tests gTLDs
  • large ditl, small ditl are capture from DITL

Types

  • int: A CBOR integer (major type 0x00)
  • uint: A CBOR integer (value >= 0, major type 0x00)
  • nint: A CBOR negative integer (value < 0, major type 0x00), this type has special meaning see Negative Integers
  • simple: A CBOR simple value (major type 0xe0)
  • bytes: A CBOR byte string (major type 0x40)
  • string: A CBOR UTF-8 string (major type 0x60)
  • any: Any CBOR value
  • bool: A CBOR boolean
  • rindex: A CBOR negative integer that is a reverse index, see Deduplication

Special Keywords

  • union: Can be used to merge the given array or map into the current object
  • optional: The attribute or object reference is optional

Negative Integers

CBOR encodes negative numbers in a special way and this format uses that for none negative number to tell them apart.

Because of that, all negative numbers needs special decoding:

value = -value - 1

Objects

The object code below uses:

  • [ and ] to indicate the start and end of an array
  • type name per object attribute
  • name per object reference
  • ... to indicate a list of previous definition
  • (, | and ) to indicate list of various types that the attribute can be

stream_init

The initial object in the stream.

[
    string version,
    union stream_option option,
    ...
]
  • version: The version of the format
  • option: A list of stream option objects

stream_option

A stream option that can specify critical information about the stream and how it should be decoded, see Stream Options for more information.

[
    uint option_type,
    optional any option_value
]
  • option_type: The type of option represented as a number
  • option_value: The option value

message

A message object that describes various DNS packets or other information.

[
    optional bool is_complete,
    union timestamp timestamp,
    simple message_bits,
    union ip_header ip_header,
    union ( icmp_message | udp_message | tcp_message | dns_message ) content
]
  • is_complete: Will exist and be false if the message is not complete and following attributes may not exists
  • timestamp: A timestamp object
  • message_bits: Bitmap indicating message content
    • Bit 0: 0=Not DNS 1=DNS
    • Bit 1: if DNS: 0=UDP 1=TCP else: 0=ICMP/ICMPv6 1=TCP
    • Bit 2: Fragmented (0=no 1=yes)
    • Bit 3: Malformed (0=no 1=yes)
  • ip_header: An IP header object
  • content: The message content, may be an ICMP, UDP, TCP or DNS message object

timestamp

The timestamp object of a message.

[
    ( uint seconds | nint diff_from_last ),
    optional uint useconds
    optional uint nseconds
]
  • seconds: The seconds of a UNIX timestamp
  • diff_from_last: The differentially from last timestamp.seconds
  • useconds: The microseconds of a UNIX timestamp or if diff_from_last is used it will be the differentially from last timestamp.useconds
  • nseconds: The nanoseconds of a UNIX timestamp or if diff_from_last is used it will be the differentially from last timestamp.nseconds

ip_header

The IP header of a message.

[
    ( uint | nint ) ip_bits,
    optional bytes src_addr,
    optional bytes dest_addr,
    optional ( uint | nint ) src_dest_port
]
  • ip_bits: Bitmap indicating IP header content, if the type is nint it also indicates that it is a reverse from last, see Deduplication for more information
    • Bit 0: address family (0=AF_INET, 1=AF_INET6)
    • Bit 1: src_addr present
    • Bit 2: dest_addr present
    • Bit 3: port present
  • src_addr: The source address with length specifying address family, 4 bytes is IPv4 and 16 is IPv6
  • dest_addr: The destination address with length specifying address family, 4 bytes is IPv4 and 16 is IPv6
  • src_dest_port: A combined source and destination port, see Source And Destination Port

Source And Destination Port

The source and destination port are combined into one value. If both source and destination exists then the value is larger then 65535, the destination will be the high 16 bits and source the low otherwise it will only be the source. If the value is negative then only the destination exists.

if value > 0xffff then
    src_port = value & 0xffff
    dest_port = value >> 16
else if value < 0 then
    dest_port = -value - 1
else
    src_port = value

icmp_message

if ip_header.ip_bits.1=0 && ip_header.ip_bits.2=0

[
    uint type,
    uint code
]
  • type: TODO
  • code: TODO

udp_message

if ip_header.ip_bits.1=1 && ip_header.ip_bits.2=0

TODO

tcp_message

if ip_header.ip_bits.2=1

[
    uint seq_nr,
    uint ack_nr,
    uint tcp_bits,
    uint window
]
  • seq_nr: TODO
  • ack_nr: TODO
  • tcp_bits: TODO
    • 0: URG
    • 1: ACK
    • 2: PSH
    • 3: RST
    • 4: SYN
    • 5: FIN
  • window: TODO

dns_message

A DNS packet.

[
    optional bool is_complete,
    uint id,
    uint raw_dns_header,        # TODO
    optional nint count_bits,
    optional uint qdcount,
    optional uint ancount,
    optional uint nscount,
    optional uint arcount,
    optional simple rr_bits,
    optional [
        dns_question question,
        ...
    ],
    optional [
        resource_record answer,
        ...
    ],
    optional [
        resource_record authority,
        ...
    ],
    optional [
        resource_record additional,
        ...
    ],
    optional bytes malformed
]
  • is_complete: Will exist and be false if the message is not complete and following attributes may not exists
  • id: DNS identifier
  • raw_dns_header: TODO
  • count_bits: Bitmap indicating which counts are present, see Negative Integers and Deduplication
    • Bit 0: qdcount present
    • Bit 1: ancount present
    • Bit 2: nscount present
    • Bit 3: arcount present
  • qdcount: Number of question records if different from the number of entries in question
  • ancount: Number of answer resource records if different from the number of entries in answer
  • nscount: Number of authority resource records if different from the number of entries in authority
  • arcount: Number of additional resource records if different from the number of entries in additional
  • question: The question records
  • answer: The answer resource records
  • authority: The authority resource records
  • additional: The additional resource records
  • malformed: Holds the bytes of the message that was not parsed

question

A DNS question record.

[
    optional bool is_complete,
    ( bytes | compressed_name | rindex ) qname,
    optional uint qtype,
    optional nint qclass
]
  • is_complete: Will exist and be false if the message is not complete and following attributes may not exists
  • qname: The QNAME as byte string, a name compression object or a reverse index, see Deduplication
  • qtype: The QTYPE, see Deduplication
  • qclass: The QCLASS, see Negative Integers and Deduplication

compressed_name

An compressed name which has references to other labels within the same message.

[
    ( bytes label | uint label_index | nint offset | simple extension_bits ),
    ...
]
  • label: A byte string with a label part
  • label_index: An index to the N byte string label in the message
  • offset: The offset specified in the DNS message which could not be translated into a label index
  • extension_bits: The extension bits if not 0b00 or 0b11 # TODO: add the extension bits

resource_record

A DNS resource record.

[
    optional bool is_complete,
    ( bytes | compressed_name | rindex ) name,
    optional simple rr_bits,
    optional uint type,
    optional uint class,
    optional uint ttl,
    optional uint rdlength,
    ( bytes | mixed_rdata ) rdata
]
  • is_complete: Will exist and be false if the message is not complete and following attributes may not exists
  • name:
  • rr_bits: Bitmap indicating what is present, see Deduplication
    • Bit 0: type
    • Bit 1: class
    • Bit 2: ttl
    • Bit 3: rdlength # TODO: reverse index for TTL?
  • type: The resource record type
  • class: The resource record class
  • ttl: The resource record ttl
  • rdlength: The resource record rdata length
  • rdata: The resource record data

mixed_rdata

An array mixed with resource data and compressed names.

[
    ( bytes | compressed_name ) rdata_part,
    ...
]
  • rdata_part: The parts of the resource records data

Stream Options

Each option is specified here as OptionName(OptionNumber) and optional OptionValue type.

  • RLABELS(0) uint: Indicates how many labels should be stored in the reverse label index before discarding them
  • RLABEL_MIN_SIZE(1) uint: The minimum size a label must be to be put in the reverse label index
  • RDATA_RINDEX_SIZE(2) uint: Indicates how many rdata should be stored in the reverse rdata index before discarding them
  • RDATA_RINDEX_MIN_SIZE(3) uint: The minimum size a rdata must be to be put in the reverse rdata index
  • USE_RDATA_INDEX(4): If present then the stream uses rdata indexing
  • RDATA_INDEX_MIN_SIZE(5) uint: The minimum size a rdata must be to be put in the rdata index

Deduplication

Deduplication is done in a few different ways, data may be left out to indicate that it is the same as the previous value, an index may be used to indicate that it is the same as the N previous value and a reverse index may be used to indicate that it is the N previous value looking backwards across the stream.

In other words, using the index deduplication you will need to build a table of the values you come across during the decoding of the stream, this table can grow very large.

As an smaller alternative a reverse index can indicate often used data from the N previous value looking back over the stream. This type of index also reorder itself to try and put the most used data always in the index.

TODO: details of each attribute and it's deduplication