Skip to content

Commit

Permalink
proposal: secondary indexes
Browse files Browse the repository at this point in the history
  • Loading branch information
james-rms committed Jul 3, 2023
1 parent dc0384d commit 6d905c7
Show file tree
Hide file tree
Showing 2 changed files with 111 additions and 0 deletions.
97 changes: 97 additions & 0 deletions website/docs/spec/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,9 +60,11 @@ The following records are allowed to appear in the data section:
- [Schema](#schema-op0x03)
- [Channel](#channel-op0x04)
- [Message](#message-op0x05)
- [Secondary Index Key](#secondary-index-key-op0x10)
- [Attachment](#attachment-op0x09)
- [Chunk](#chunk-op0x06)
- [Message Index](#message-index-op0x07)
- [Secondary Message Index](#secondary-message-index-op0x11)
- [Metadata](#metadata-op0x0C)
- [Data End](#data-end-op0x0F)

Expand All @@ -82,7 +84,9 @@ The following records are allowed to appear in the summary section:

- [Schema](#schema-op0x03)
- [Channel](#channel-op0x04)
- [Secondary Index Key](#secondary-index-key-op0x10)
- [Chunk Index](#chunk-index-op0x08)
- [Secondary Chunk Index](#secondary-chunk-index-op0x12)
- [Attachment Index](#attachment-index-op0x0A)
- [Metadata Index](#metadata-index-op0x0D)
- [Statistics](#statistics-op0x0B)
Expand Down Expand Up @@ -179,6 +183,30 @@ The message encoding and schema must match that of the Channel record correspond
| 8 | publish_time | Timestamp | Time at which the message was published. If not available, must be set to the log time. |
| N | data | Bytes | Message data, to be decoded according to the schema of the channel. |

### Secondary Index Key (op=0x10)

A Secondary Index Key record defines a secondary timestamp index that will be used in this file.
Secondary Indexes can be used to quickly look up messages by timestamps other than `log_time`.
The `name` field identifies the timestamp key that messages will be indexed by. The [registry](./registry.md#secondary-index-keys) lists well-known secondary index key names.

A Secondary Index Key record must appear before any [Secondary Message Index](#secondary-message-index-op0x11) records
in the data section with this `secondary_index_id`.

Secondary Index Key records in the Data section must also appear in the Summary section, before
any [Secondary Chunk Index](#secondary-chunk-index-op0x12) records with this `secondary_index_id`.

| Bytes | Name | Type | Description |
| ----- | ------------------ | ------ | ----------------------------------------------------------------- |
| 2 | secondary_index_id | uint16 | A unique identifier for this secondary index within the file. |
| 4 + N | name | string | A name that describes the key, eg. `publish_time`, `header.stamp` |

> Why do Secondary Index Key records appear in the Data section?
> When reading using an index, the Secondary Index Key would be read out of the Summary section
> before reading into the Data section. This means that the Secondary Index Key in the Data section
> is not normally used. However, if a MCAP is truncated and the summary section is lost, having the
> Secondary Index Key appear before any Secondary Message Index records allows the MCAP to be fully
> recovered.
### Chunk (op=0x06)

A Chunk contains a batch of Schema, Channel, and Message records. The batch of records contained in a chunk may be compressed or uncompressed.
Expand Down Expand Up @@ -207,6 +235,17 @@ A sequence of Message Index records occurs immediately after each chunk. Exactly

Messages outside of chunks cannot be indexed.

### Secondary Message Index (op=0x11)

A Secondary Message Index record allows readers to locate individual message records within a chunk using a
key defined in a [Secondary Index Key record](#secondary-index-key-op0x10).

| Bytes | Name | Type | Description |
| ----- | ------------------ | --------------------------------- | -------------------------------------------------------------------------------------------------------------- |
| 2 | channel_id | uint16 | Channel ID. |
| 2 | secondary_index_id | uint16 | Secondary Index ID. |
| 4 + N | records | `Array<Tuple<Timestamp, uint64>>` | Array of timestamp and offset for each record. Offset is relative to the start of the uncompressed chunk data. |

### Chunk Index (op=0x08)

A Chunk Index record contains the location of a Chunk record and its associated Message Index records.
Expand All @@ -229,6 +268,18 @@ A Schema and Channel record MUST exist in the summary section for all channels r

> Why? The typical use case for file readers using an index is fast random access to a specific message timestamp. Channel is a prerequisite for decoding Message record data. Without an easy-to-access copy of the Channel records, readers would need to search for Channel records from the start of the file, degrading random access read performance.
### Secondary Chunk Index (op=0x12)

A secondary Chunk Index record contains additional secondary index information on top of the corresponding Chunk Index record.

| Bytes | Name | Type | Description |
| ----- | --------------------- | --------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| 2 | secondary_index_id | uint16 | Secondary Index ID. |
| 8 | chunk_start_offset | uint64 | Offset to the chunk record from the start of the file. |
| 8 | earliest_key | Timestamp | Earliest key in the chunk. Zero if the chunk contains no messages with this key. |
| 8 | latest_key | Timestamp | Latest key in the chunk. Zero if the chunk contains no messages with this key. |
| 4 + N | message_index_offsets | `Map<uint16, uint64>` | Mapping from channel ID to the offset of the secondary message index record with this `secondary_index_id` for that channel after the chunk, from the start of the file. An empty map indicates no message indexing is available. |

### Attachment (op=0x09)

Attachment records contain auxiliary artifacts such as text, core dumps, calibration data, or other arbitrary data.
Expand Down Expand Up @@ -522,6 +573,52 @@ A writer may choose to put messages in Chunks to compress record data. This MCAP
[Footer]
```

### Multiple Messages with a Secondary Index

```
[Header]
[Secondary Index Key 1]
[Chunk A]
[Schema A]
[Channel 1 (A)]
[Channel 2 (B)]
[Message on 1]
[Message on 1]
[Message on 2]
[Message Index 1]
[Message Index 2]
[Secondary Message Index 1 (Channel 1)]
[Secondary Message Index 1 (Channel 2)]
[Attachment 1]
[Chunk B]
[Schema B]
[Channel 3 (B)]
[Message on 3]
[Message on 1]
[Message Index 3]
[Message Index 1]
[Secondary Message Index 1 (Channel 3)]
[Secondary Message Index 1 (Channel 1)]
[Data End]
[Schema A]
[Schema B]
[Channel 1]
[Channel 2]
[Channel 3]
[Secondary Index Key 1]
[Chunk Index A]
[Chunk Index B]
[Secondary Chunk Index 1 (Chunk A)]
[Secondary Chunk Index 1 (Chunk B)]
[Attachment Index 1]
[Statistics]
[Summary Offset 0x01]
[Summary Offset 0x05]
[Summary Offset 0x07]
[Summary Offset 0x08]
[Footer]
```

## Further Reading

- [Feature explanations][feature_explanations]: includes usage details that may be useful to implementers of readers or writers.
14 changes: 14 additions & 0 deletions website/docs/spec/registry.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,3 +152,17 @@ The `ros2` profile describes how to create MCAP files for [ROS 2](https://docs.r
#### Schema

- `encoding`: MUST be either `ros2msg` or `ros2idl`

## Secondary index keys

The Secondary Index Key `name` field may contain the following options:

### `header.stamp`

Indexes the `stamp` value of the `std_msgs/msg/Header`-valued `header` field of the deserialized message data.

- `profile`: must be `ros1` or `ros2`

### `publish_time`

Indexes the `publish_time` value of Message records.

0 comments on commit 6d905c7

Please sign in to comment.