ADR 003: Retrieving Application messages

Changelog

2021-04-25: initial draft

Context

This ADR builds on top of ADR 002 and will use the implemented APIs described there. The reader should familiarize themselves at least with the high-level concepts the as well as in the specs.

The academic paper describes the motivation and context for this API. The main motivation can be quoted from section 3.3 of that paper:

(Property1) Application message retrieval partitioning. Client nodes must be able to download all of the messages relevant to the applications they use [...], without needing to downloading any messages for other applications.

(Property2) Application message retrieval completeness. When client nodes download messages relevant to the applications they use [...], they must be able to verify that the messages they received are the complete set of messages relevant to their applications, for specific blocks, and that there are no omitted messages.

The main data structure that enables above properties is called a Namespaced Merkle Tree (NMT), an ordered binary Merkle tree where:

each node in the tree includes the range of namespaces of the messages in all descendants of each node
leaves in the tree are ordered by the namespace identifiers of the leaf messages

A more formal description can be found the specification. An implementation can be found in this repository.

This ADR basically describes version of the GetWithProof of the NMT that leverages the fact that IPFS uses content addressing and that we have implemented an IPLD plugin for an NMT.

Note: The APIs defined here will be particularly relevant for Optimistic Rollup (full) nodes that want to download their Rollup's data (see celestiaorg/optimint#48). Another potential use-case of this API could be for so-called light validator nodes that want to download and replay the state-relevant portion of the block data, i.e. transactions with reserved namespace IDs.

Alternative Approaches

The approach described below will rely on IPFS' block exchange protocol (bitswap) and DHT; IPFS's implementation will be used as a black box to find peers that can serve the requested data. This will likely be much slower than it potentially could be and for a first implementation we intentionally do not incorporate the optimizations that we could.

We briefly mention potential optimizations for the future here:

Use of graphsync instead of bitswap and use of IPLD selectors
expose an API to be able to download application specific data by namespace (including proofs) with the minimal number of round-trips (e.g. finding nodes that expose an RPC endpoint like GetWithProof)

Decision

Most discussions on this particular API happened either on calls or on other non-documented way. We only describe the decision in this section.

We decide to implement the simplest approach first. We first describe the protocol informally here and explain why this fulfils (Property1) and (Property2) in the Context section above.

In the case that leaves with the requested namespace exist, this basically boils down to the following: traverse the tree starting from the root until finding first leaf (start) with the namespace in question, then directly request and download all leaves coming after the start until the namespace changes to a greater than the requested one again. In the case that no leaves with the requested namespace exist in the tree, we traverse the tree to find the leaf in the position in the tree where the namespace would have been and download the neighbouring leaves.

This is pretty much what the ProveNamespace method does but using IPFS we can simply locate and then request the leaves, and the corresponding inner proof nodes will automatically be downloaded on the way, too.

Detailed Design

We define one function that returns all shares of a block belonging to a requested namespace and block (via the block's data availability header). See ComputeShares for reference how encode the block data into namespace shares.

// RetrieveShares returns all raw data (raw shares) of the passed-in
// namespace ID nID and included in the block with the DataAvailabilityHeader dah.
func RetrieveShares(
    ctx context.Context,
    nID namespace.ID,
    dah *types.DataAvailabilityHeader,
    api coreiface.CoreAPI,
) ([][]byte, error) {
    // 1. Find the row root(s) that contains the namespace ID nID
    // 2. Traverse the corresponding tree(s) according to the
    //    above informally described algorithm and get the corresponding
    //    leaves (if any)
    // 3. Return all (raw) shares corresponding to the nID
}

Additionally, we define two functions that use the first one above to:

return all the parsed (non-padding) data with reserved namespace IDs: transactions, intermediate state roots, evidence.
return all application specific blobs (shares) belonging to one namespace ID parsed as a slice of Messages (specification and code).

The latter two methods might require moving or exporting a few currently unexported functions that (currently) live in share_merging.go and could be implemented in a separate pull request.

// RetrieveStateRelevantMessages returns all state-relevant transactions
// (transactions, intermediate state roots, and evidence) included in a block
// with the DataAvailabilityHeader dah.
func RetrieveStateRelevantMessages(
    ctx context.Context,
    nID namespace.ID,
    dah *types.DataAvailabilityHeader,
    api coreiface.CoreAPI,
) (Txs, IntermediateStateRoots, EvidenceData, error) {
    // like RetrieveShares but for all reserved namespaces
    // additionally the shares are parsed (merged) into the
    // corresponding types in the return arguments
}

// RetrieveMessages returns all Messages of the passed-in
// namespace ID and included in the block with the DataAvailabilityHeader dah.
func RetrieveMessages(
    ctx context.Context,
    nID namespace.ID,
    dah *types.DataAvailabilityHeader,
    api coreiface.CoreAPI,
) (Messages, error) {
    // like RetrieveShares but this additionally parsed the shares
    // into the Messages type
}

Status

Proposed

Consequences

This API will most likely be used by Rollups too. We should document it properly and move it together with relevant parts from ADR 002 into a separate go-package.

Positive

easy to implement with the existing code (see ADR 002)
resilient data retrieval via a p2p network
dependence on a mature and well-tested code-base with a large and welcoming community

Negative

with IPFS, we inherit the fact that potentially a lot of round-trips are done until the data is fully downloaded; in other words: this could end up way slower than potentially possible
anyone interacting with that API needs to run an IPFS node

Neutral

optimizations can happen incrementally once we have an initial working version

References

We've linked to all references throughout the ADR.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adr-003-application-data-retrieval.md

adr-003-application-data-retrieval.md

ADR 003: Retrieving Application messages

Changelog

Context

Alternative Approaches

Decision

Detailed Design

Status

Consequences

Positive

Negative

Neutral

References

Files

adr-003-application-data-retrieval.md

Latest commit

History

adr-003-application-data-retrieval.md

File metadata and controls

ADR 003: Retrieving Application messages

Changelog

Context

Alternative Approaches

Decision

Detailed Design

Status

Consequences

Positive

Negative

Neutral

References