Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support encoding more dataclass-like things #501

Merged
merged 1 commit into from
Aug 2, 2023

Conversation

jcrist
Copy link
Owner

@jcrist jcrist commented Jul 31, 2023

Previously we supported encoding dataclasses (determined as any object with a __dataclass_fields__ attribute), provided those objects were implemented in a way similar-enough to how they were implemented in the standard library. The intent was to support stdlib dataclasses, and if alternative implementations (e.g. pydantic.dataclasses) happened to work then all the better.

However, due to how we were detecting if an object was a dataclass, there was no way to override our builtin support if an alternative implementation (in this case edgedb.Object) didn't work.

To fix this, we now make fewer assumptions about how the backing dataclass object is implemented.

Pros:

  • We can now natively encode objects implemented using dataclasses, pydantic.dataclasses, and edgedb.Object.
  • We now only encode fields as declared on the dataclass object. Previously we encoded any attribute lacking a leading underscore, which was efficient and worked well in practice (it's also what orjson does). However, this can lead to weird behavior if some fields intentionally start with an _ (like _id in mongodb) or if the object makes use of functools.cached_property.

Cons:

  • This flexibility and correctness comes at a performance cost. The fast path is the common case (dataclass uses __dict__, doesn't override __getattribute__), but encoding is now ~20% slower than before. Before we encoded __dict__ based dataclasses 20% faster than orjson; now we're faster for small classes and slower for larger numbers of fields (on my machine 12 fields is the crossover point). For __slots__ based classes we're still around 2x faster than orjson.

Fixes #495.

TODO:

  • msgspec.json.encode
  • msgspec.msgpack.encode
  • msgspec.to_builtins
  • tests
  • update docs

Previously we supported encoding dataclasses (determined as any object
with a `__dataclass_fields__` attribute), provided those objects were
implemented in a way similar-enough to how they were implemented in the
standard library. The intent was to support stdlib dataclasses, and if
alternative implementations (e.g. `pydantic.dataclasses`) happened to
work then all the better.

However, due to how we were detecting if an object was a dataclass,
there was no way to override our builtin support if an alternative
implementation (in this case `edgedb.Object`) didn't work.

To fix this, we now make fewer assumptions about how the backing
dataclass object is implemented.

Pros:

- We can now natively encode objects implemented using `dataclasses`,
  `pydantic.dataclasses`, and `edgedb.Object`.
- We now only encode fields as declared on the dataclass object.
  Previously we encoded any attribute lacking a leading underscore,
  which was efficient and worked well in practice (it's also what
  `orjson` does). However, this can lead to weird behavior if some
  fields intentionally start with an `_` (like `_id` in mongodb) or if
  the object makes use of `functools.cached_property`.

Cons:

- This flexibility and correctness comes at a performance cost. The fast
  path is the common case (dataclass uses `__dict__`, doesn't override
  `__getattribute__`), but encoding is now ~20% slower than before.
  Before we encoded `__dict__` based dataclasses 20% faster than orjson;
  now we're faster for small classes (<= 8 items, on my machine) and
  slower for larger numbers of fields. For `__slots__` based classes
  we're still around 2x faster than `orjson`.
@jcrist jcrist changed the title WIP: Support encoding more dataclass-like things Support encoding more dataclass-like things Aug 2, 2023
@jcrist jcrist merged commit 5e1d16f into main Aug 2, 2023
7 checks passed
@jcrist jcrist deleted the refactor-dataclass-encode branch August 2, 2023 03:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for "non traditional" dataclasses
1 participant