Skip to content

Commit

Permalink
perf: canonical encoding via type parameters
Browse files Browse the repository at this point in the history
Signed-off-by: Liam Gray <[email protected]>
  • Loading branch information
hoxxep committed Nov 28, 2024
1 parent bef3820 commit 2a3025a
Show file tree
Hide file tree
Showing 6 changed files with 212 additions and 138 deletions.
1 change: 0 additions & 1 deletion ciborium/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,6 @@ hex = "0.4"
[features]
default = ["std"]
std = ["ciborium-io/std", "serde/std"]
canonical = ["std"]

[package.metadata.docs.rs]
all-features = true
34 changes: 23 additions & 11 deletions ciborium/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,20 +11,13 @@ Ciborium contains CBOR serialization and deserialization implementations for ser

## Quick Start

You're probably looking for [`from_reader()`](crate::de::from_reader)
and [`into_writer()`](crate::ser::into_writer), which are
the main functions. Note that byte slices are also readers and writers and can be
passed to these functions just as streams can.
You're probably looking for [`from_reader()`](crate::de::from_reader),
[`to_vec()`](crate::ser::to_vec), and [`into_writer()`](crate::ser::into_writer),
which are the main functions. Note that byte slices are also readers and writers
and can be passed to these functions just as streams can.

For dynamic CBOR value creation/inspection, see [`Value`](crate::value::Value).

## Features
- `std`: enabled by default.
- `canonical`: allows serializing with a `CanonicalizationScheme` for deterministic
outputs. Incurs a small performance penalty (~20% slower) when serializing
without a canonicalization scheme, and a large penalty (~100% slower) when
serializing with a canonicalization scheme.

## Design Decisions

### Always Serialize Numeric Values to the Smallest Size
Expand Down Expand Up @@ -96,4 +89,23 @@ be avoided because it can be fragile as it exposes invariants of your Rust
code to remote actors. We might consider adding this in the future. If you
are interested in this, please contact us.

### Canonical Encodings

The ciborium crate has support for various canonical encodings during
serialization.

- [`NoCanonicalization`](crate::canonical::NoCanonicalization): the default,
numbers are still encoded in their smallest form, but map keys are not
sorted for maximum serialization speed.
- [`Rfc7049`](crate::canonical::Rfc7049): the canonicalization scheme from
RFC 7049 that sorts map keys in a length-first order. Eg.
`["a", "b", "aa"]`.
- [`Rfc8949`](crate::canonical::Rfc8949): the canonicalization scheme from
RFC 8949 that sorts map keys in a bytewise lexicographic order. Eg.
`["a", "aa", "b"]`.

To use canonicalization, you must enable the `std` feature. See the examples
in [`to_vec_canonical`](crate::ser::to_vec_canonical) and
[`into_writer_canonical`](crate::ser::into_writer_canonical) for more.

License: Apache-2.0
73 changes: 73 additions & 0 deletions ciborium/src/canonical.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
//! Canonicalization support for CBOR serialization.
//!
//! Supports various canonicalization schemes for deterministic CBOR serialization. The default is
//! [NoCanonicalization] for the fastest serialization. Canonical serialization is around 2x slower.
/// Which canonicalization scheme to use for CBOR serialization.
///
/// Can only be initialized with the `std` feature enabled.
#[doc(hidden)]
#[derive(Debug, Copy, Clone, PartialEq, Eq)]
pub enum CanonicalizationScheme {
/// Sort map keys in output according to [RFC 7049]'s deterministic encoding spec.
///
/// Also aligns with [RFC 8949 4.2.3]'s backwards compatibility sort order.
///
/// Uses length-first map key ordering. Eg. `["a", "b", "aa"]`.
#[cfg(feature = "std")]
Rfc7049,

/// Sort map keys in output according to [RFC 8949]'s deterministic encoding spec.
///
/// Uses bytewise lexicographic map key ordering. Eg. `["a", "aa", "b"]`.
#[cfg(feature = "std")]
Rfc8949,
}

/// Don't sort map key output.
pub struct NoCanonicalization;

/// Sort map keys in output according to [RFC 7049]'s deterministic encoding spec.
///
/// Also aligns with [RFC 8949 4.2.3]'s backwards compatibility sort order.
///
/// Uses length-first map key ordering. Eg. `["a", "b", "aa"]`.
#[cfg(feature = "std")]
pub struct Rfc7049;

/// Sort map keys in output according to [RFC 8949]'s deterministic encoding spec.
///
/// Uses bytewise lexicographic map key ordering. Eg. `["a", "aa", "b"]`.
#[cfg(feature = "std")]
pub struct Rfc8949;

/// Trait for canonicalization schemes.
///
/// See implementors:
/// - [NoCanonicalization] for no canonicalization (fastest).
/// - [Rfc7049] for length-first map key sorting.
/// - [Rfc8949] for bytewise lexicographic map key sorting.
pub trait Canonicalization {
/// True if keys should be cached and sorted.
const IS_CANONICAL: bool;

/// Determines which sorting implementation to use.
const SCHEME: Option<CanonicalizationScheme>;
}

impl Canonicalization for NoCanonicalization {
const IS_CANONICAL: bool = false;
const SCHEME: Option<CanonicalizationScheme> = None;
}

#[cfg(feature = "std")]
impl Canonicalization for Rfc7049 {
const IS_CANONICAL: bool = true;
const SCHEME: Option<CanonicalizationScheme> = Some(CanonicalizationScheme::Rfc7049);
}

#[cfg(feature = "std")]
impl Canonicalization for Rfc8949 {
const IS_CANONICAL: bool = true;
const SCHEME: Option<CanonicalizationScheme> = Some(CanonicalizationScheme::Rfc8949);
}
41 changes: 25 additions & 16 deletions ciborium/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,20 +6,13 @@
//!
//! # Quick Start
//!
//! You're probably looking for [`from_reader()`](crate::de::from_reader)
//! and [`into_writer()`](crate::ser::into_writer), which are
//! the main functions. Note that byte slices are also readers and writers and can be
//! passed to these functions just as streams can.
//! You're probably looking for [`from_reader()`](crate::de::from_reader),
//! [`to_vec()`](crate::ser::to_vec), and [`into_writer()`](crate::ser::into_writer),
//! which are the main functions. Note that byte slices are also readers and writers
//! and can be passed to these functions just as streams can.
//!
//! For dynamic CBOR value creation/inspection, see [`Value`](crate::value::Value).
//!
//! # Features
//! - `std`: enabled by default.
//! - `canonical`: allows serializing with a `CanonicalizationScheme` for deterministic
//! outputs. Incurs a small performance penalty (~20% slower) when serializing
//! without a canonicalization scheme, and a large penalty (~100% slower) when
//! serializing with a canonicalization scheme.
//!
//! # Design Decisions
//!
//! ## Always Serialize Numeric Values to the Smallest Size
Expand Down Expand Up @@ -90,6 +83,25 @@
//! be avoided because it can be fragile as it exposes invariants of your Rust
//! code to remote actors. We might consider adding this in the future. If you
//! are interested in this, please contact us.
//!
//! ## Canonical Encodings
//!
//! The ciborium crate has support for various canonical encodings during
//! serialization.
//!
//! - [`NoCanonicalization`](crate::canonical::NoCanonicalization): the default,
//! numbers are still encoded in their smallest form, but map keys are not
//! sorted for maximum serialization speed.
//! - [`Rfc7049`](crate::canonical::Rfc7049): the canonicalization scheme from
//! RFC 7049 that sorts map keys in a length-first order. Eg.
//! `["a", "b", "aa"]`.
//! - [`Rfc8949`](crate::canonical::Rfc8949): the canonicalization scheme from
//! RFC 8949 that sorts map keys in a bytewise lexicographic order. Eg.
//! `["a", "aa", "b"]`.
//!
//! To use canonicalization, you must enable the `std` feature. See the examples
//! in [`to_vec_canonical`](crate::ser::to_vec_canonical) and
//! [`into_writer_canonical`](crate::ser::into_writer_canonical) for more.
#![cfg_attr(not(feature = "std"), no_std)]
#![deny(missing_docs)]
Expand All @@ -99,6 +111,7 @@

extern crate alloc;

pub mod canonical;
pub mod de;
pub mod ser;
pub mod tag;
Expand All @@ -113,11 +126,7 @@ pub use crate::ser::{into_writer, Serializer};

#[doc(inline)]
#[cfg(feature = "std")]
pub use crate::ser::to_vec;

#[doc(inline)]
#[cfg(feature = "canonical")]
pub use crate::ser::{into_writer_canonical, to_vec_canonical};
pub use crate::ser::{into_writer_canonical, to_vec, to_vec_canonical};

#[cfg(feature = "std")]
#[doc(inline)]
Expand Down
Loading

0 comments on commit 2a3025a

Please sign in to comment.