Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add order option to encoders #627

Merged
merged 1 commit into from
Jan 16, 2024
Merged

Add order option to encoders #627

merged 1 commit into from
Jan 16, 2024

Conversation

jcrist
Copy link
Owner

@jcrist jcrist commented Jan 7, 2024

Add order kwarg to encoders

This adds an order kwarg to all encoders for configuring how unordered
collections/objects are encoded. Options are:

  • None: the default. All objects are encoded in the most efficient
    manner corresponding to their in-memory representation.
  • 'deterministic': Unordered collections (sets, dicts) are sorted
    before encoding. This ensures a consistent output between runs, which
    may be useful when comparing/hashing the encoded binary
    representation.
  • 'sorted': same as 'deterministic', but all objet-like objects
    will have their fields encoded in alphabetical order by name. This is
    more expensive than 'deterministic', but may be useful for making
    the output more human readable.

The 'deterministic' output has been heavily optimized - given the work
required to accomplish this feature, I wouldn't expect we can speed up
this operation much more. The 'sorted' option has not been fully
optimized (the assumption being a human-readable output is rarely perf
sensitive). If needed, there are some rather simple optimizations we can
add here to speed this up further.

In general, msgspec.json.encode(obj, order="deterministic") should be
as fast or faster than orjson.dumps(obj, option=orjson.OPT_SORT_KEYS).
For common small object sizes we average a ~20% speedup over orjson
for key sorting.

In [1]: import msgspec, orjson, random

In [2]: enc = msgspec.json.Encoder(order="deterministic")

In [3]: keys = [f'field_{i}' for i in range(6)]

In [4]: random.shuffle(keys)

In [5]: msg = dict(zip(keys, range(len(keys))))

In [6]: %timeit enc.encode(msg)
305 ns ± 2.99 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [7]: %timeit orjson.dumps(msg, option=orjson.OPT_SORT_KEYS)
377 ns ± 2.04 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

Fixes #609.

@jcrist jcrist changed the title [WIP] Add sort_keys option to encoders Add sort_keys option to encoders Jan 15, 2024
@jcrist jcrist changed the title Add sort_keys option to encoders Add order option to encoders Jan 15, 2024
This adds an `order` kwarg to all encoders for configuring how unordered
collections/objects are encoded. Options are:

- `None`: the default. All objects are encoded in the most efficient
  manner corresponding to their in-memory representation.
- `'deterministic'`: Unordered collections (sets, dicts) are sorted
  before encoding. This ensures a consistent output between runs, which
  may be useful when comparing/hashing the encoded binary
  representation.
- `'sorted'`: same as `'deterministic'`, but *all* objet-like objects
  will have their fields encoded in alphabetical order by name. This is
  more expensive than `'deterministic'`, but may be useful for making
  the output more human readable.

The `'deterministic'` output has been heavily optimized - given the work
required to accomplish this feature, I wouldn't expect we can speed up
this operation much more. The `'sorted'` option has not been fully
optimized (the assumption being a human-readable output is rarely perf
sensitive). If needed, there are some rather simple optimizations we can
add here to speed this up further.

In general, `msgspec.json.encode(obj, order="deterministic")` should be
as fast or faster than `orjson.dumps(obj, option=orjson.OPT_SORT_KEYS)`.
For common small object sizes we average a ~25% speedup over `orjson`
for key sorting.
@jcrist jcrist merged commit 38c3330 into main Jan 16, 2024
8 checks passed
@jcrist jcrist deleted the sort-keys branch January 16, 2024 00:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

encode sort_keys argument
1 participant