Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: feat: introduce normalizer #423

Merged
merged 23 commits into from
Dec 26, 2023
Merged

WIP: feat: introduce normalizer #423

merged 23 commits into from
Dec 26, 2023

Conversation

romm
Copy link
Member

@romm romm commented Aug 29, 2023

This introduces a new feature — a normalizer, which aims to do the opposite of the mapper: recursively transform a structure of objects/others to a nested array of scalar values, that can then easily be encoded to JSON or other basic data-format.

For further reference, please read discussion #420.


The main idea behind this normalizer is a bit different than what can be found in other normalization/serialization libraries, in the way that this library — Valinor — will not leak any customization attribute/interface/trait in the objects being serialized. This is one of the most important goals, as it allows the developers to remain the owners of all business logic rules.

Instead of attribute/interface/trait, the NormalizerBuilder will be customized with “handlers”: these are callables that can be chained to customize the normalization result for a given object. A handler must specify at least one argument in its callback: the type of the object it will be handling — note that this type can be the native type object so that the handler will be applied on every object. A second argument can be specified, a callable that allows the handler to call the next handler in the queue, to get its result and apply custom modifications on it.

To make this more understandable, below are adaptations of some customization features that can be found in other libraries (often driven by attributes):

Note

The default normalized representation of an object will be done by accessing all its properties — no matter the visibility (private/protected/public).

For better understanding, take a look at the major integration test.

Basic object custom normalization

Sometimes, the normalized representation of an object can require some tiny adjustments; the object below needs to append/prepend underscores to its keys for some business reason. The object just needs to implement the native __serialize() method and add the needed logic in it. — behavior changed, see #423 (comment)

A global handler is registered to detect if an object defines a normalize() method, in which case the result will be used by the normalizer.

final class SomeObject
{
    public function __construct(
        public readonly string $foo,
        public readonly string $bar,
        public readonly string $baz,
    ) {}

    public function normalize(): array
    {
        return [
            'foo_' => $this->foo,
            '_bar' => $this->bar,
            '_baz_' => $this->baz,
        ];
    }
}

$result = (new \CuyZ\Valinor\NormalizerBuilder())
    ->addHandler(function (object $object, callable $next) {
        return method_exists($object, 'normalize')
            ? $object->normalize()
            : $next();
    })
    ->normalizer()
    ->normalize(new SomeObject('foo', 'bar', 'baz'));

$result === ['foo_' => 'foo', '_bar' => 'bar', '_baz_' => 'baz']; // ✅
Transforming all keys to snake_case

This is a classic one: JSON-encoded API often require keys to be in the snake_case format whereas PHP's classes' properties are often written using the camelCase format. In the example below, the transformation is done globally and recursively on every object during normalization. This could of course be adapted to match one's needs if necessary.

namespace My\App;

final class SomeObject
{
    public function __construct(
        public readonly string $someFirstProperty,
        public readonly string $someSecondProperty,
        public readonly string $someThirdProperty,
    ) {}
}

final class CamelKeyToSnakeKeyHandler
{
    public function __invoke(object $object, callable $next): mixed
    {
        $result = $next();

        if (! is_array($result)) {
            return $result;
        }

        $snake_cased = [];

        foreach ($result as $key => $value) {
            // Probably not the best option here, but we don't mind for the example
            $newKey = strtolower(preg_replace('/(?<!^)[A-Z]/', '_$0', $key));

            $snake_cased[$newKey] = $value;
        }

        return $snake_cased;
    }
}

$result = (new \CuyZ\Valinor\NormalizerBuilder())
    ->addHandler(new \My\App\CamelKeyToSnakeKeyHandler())
    ->normalizer()
    ->normalize(new SomeObject('foo', 'bar', 'baz'));

$result === ['some_first_property' => 'foo', 'some_second_property' => 'bar', 'some_third_property' => 'baz']; // ✅
Ignoring properties during normalization

To ignore properties, the __serialize() method could easily be used. But an alternative here is to provide an interface IgnoresValuesOnNormalization that can be implemented by any object. This interface is then detected during the normalization to do the job of actually un-setting the values that should be ignored.

namespace My\App;

interface IgnoresValuesOnNormalization
{
    /**
     * @return non-empty-list<string>
     */
    public function ignoredKeys(): array;
}

final class IgnoredValuesHandler
{
    function __invoke(\My\App\IgnoresValuesOnNormalization $object, callable $next): mixed
    {
        $result = $next();

        foreach ($object->ignoredKeys() as $key) {
            unset($result[$key]);
        }

        return $result;
    }
}

final class SomeObject implements \My\App\IgnoresValuesOnNormalization
{
    public function __construct(
        public readonly string $foo,
        public readonly string $bar,
        public readonly string $baz,
    ) {}

    public function ignoredKeys(): array
    {
        return ['bar']; // The `bar` property will be ignored on normalization
    }
}

$result = (new \CuyZ\Valinor\NormalizerBuilder())
    ->addHandler(new \My\App\IgnoredValuesHandler())
    ->normalizer()
    ->normalize(new SomeObject('foo', 'bar', 'baz'));

$result === ['foo' => 'foo', 'baz' => 'baz']; // ✅
Adding prefix to key

Same kind of example as above: an interface is used by objects to specify when a prefix should be added to the all keys during normalization. If each key should have a different prefix/suffix, the use of __serialize() method would probably be a better choice.

namespace My\App;

interface AddsPrefixToKeyOnNormalization
{
    public function prefix(): string;
}

final class SomeObject implements \My\App\AddsPrefixToKeyOnNormalization
{
    public function __construct(
        public readonly string $foo,
        public readonly string $bar,
        public readonly string $baz,
    ) {}

    public function prefix(): string
    {
        return 'prefix_';
    }
}

final class PrefixedValuesHandler
{
    function __invoke(\My\App\AddsPrefixToKeyOnNormalization $object, callable $next): array
    {
        $prefix = $object->prefix();

        $prefixed = [];

        foreach ($next() as $key => $value) {
            $prefixed[$prefix . $key] = $value;
        }

        return $prefixed;
    }
}

$result = (new \CuyZ\Valinor\NormalizerBuilder())
    ->addHandler(new \My\App\PrefixedValuesHandler())
    ->normalizer()
    ->normalize(new SomeObject('foo', 'bar', 'baz'));

$result === ['prefix_foo' => 'foo', 'prefix_bar' => 'bar', 'prefix_baz' => 'baz']; // ✅
API versioning

This is a more complex example, but it shows how the normalizer can be used to handle API versioning. The idea here is that an API can evolve over time, and sometimes the normalized representation of an object in the API response can change.

namespace My\App;

interface HasVersionedNormalization
{
    public function normalizeWithVersion(string $version): mixed;
}

final class SomeObject implements \My\App\HasVersionedNormalization
{
    public function __construct(
        public readonly string $foo,
        public readonly string $bar,
        public readonly string $baz,
    ) {}

    public function normalizeWithVersion(string $version): mixed
    {
        return match (true) {
            version_compare($version, '1.0.0', '<') => [
                'old_key' => $this->foo,
                'merge_of_two_keys' => $this->bar . ':' . $this->baz,
            ],
            version_compare($version, '2.0.0', '<') => [
                'new_key' => $this->foo,
                'merge_of_two_keys' => $this->bar . ':' . $this->baz,
            ],
            default => get_object_vars($this)
        };
    }
}

function normalizeWithVersion(string $version): mixed {
    return (new \CuyZ\Valinor\NormalizerBuilder())
        ->addHandler(fn (\My\App\HasVersionedNormalization $object) => $object->normalizeWithVersion($version))
        ->normalizer()
        ->normalize(new SomeObject('foo', 'bar', 'baz'));
}

// Version can come from request or somewhere else
$result_v0_4_7 = normalizeWithVersion('0.4.7');
$result_v1_8_2 = normalizeWithVersion('1.8.2');
$result_v2_5_3 = normalizeWithVersion('2.5.3');

$result_v0_4_7 === ['old_key' => 'foo', 'merge_of_two_keys' => 'bar:baz']; // ✅
$result_v1_8_2 === ['new_key' => 'foo', 'merge_of_two_keys' => 'bar:baz']; // ✅
$result_v2_5_3 === ['foo' => 'foo', 'bar' => 'bar', 'baz' => 'baz']; // ✅
Dates format

This is a very common use-case: dates are often represented as strings in data-format like JSON. By default, the normalizer will format any date using the RFC 3339. This can be customized by adding a handler that will be applied on every DateTimeInterface object.

$result = (new \CuyZ\Valinor\NormalizerBuilder())
    ->addHandler(fn (DateTimeInterface $date) => $date->format('Y/m/d'))
    ->normalizer()
    ->normalize(new DateTimeImmutable('2023-08-29'));

$result === '2023/08/29'; // ✅

Note that other implementations could be imagined to offer a more precise control over the format.

I really want to emphasize the fact that, in the examples above, the library does not leak in the objects normalization rules: the logic is not coupled to the normalizer and does (should) stay in the domain-layer of the application.

In some cases, it can seem cumbersome to implement the logic, but I strongly believe that in most cases this is trivial work that can be done quickly. More importantly, it can be adapted/improved directly by the developers of the application, following the constantly-changing needs of the business rules, without being tied up to a third-party library release-cycle.


I'd be glad to have your feedback on this feature, and I'm open to any suggestion.

Please don't hesitate to checkout the branch and try it out.

@romm romm marked this pull request as draft August 29, 2023 20:38
@TimWolla
Copy link
Contributor

The object just needs to implement the native __serialize() method and add the needed logic in it.

As someone who worked quite a bit with PHP's native serialize() function, this is likely not a good idea, especially if __unserialize() is not also implemented. Those objects will fail to round-trip through the native serialize(). jsonSerialize() would be more reasonable.

}

if ($object instanceof DateTimeInterface) {
return fn () => $object->format('Y-m-d\\TH:i:sP'); // RFC 3339
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should microseconds be preserved here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go for it, if that's a problem for some people, they will tell us and we will manage then.

return $value;
}

if (is_iterable($value)) {
Copy link
Contributor

@TimWolla TimWolla Aug 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prioritizing this over is_object() makes it impossible to add custom handling for classes implementing IteratorAggregate. I'd also argue that the default behavior is a little unexpected in general, because only the children will be returned by default and properties by the IteratorAggregate will be ignored.

Example:

<?php

require('vendor/autoload.php');

final class SomeObject implements IteratorAggregate
{
    public function __construct(
        public readonly string $key,
        public readonly string $value,
    ) {
    }

    public function getIterator(): Traversable
    {
        yield from [];
    }
}


var_dump(
    (new \CuyZ\Valinor\NormalizerBuilder())
        ->addHandler(fn (SomeObject $object) => [$object->key, $object->value])
        ->normalizer()
        ->normalize(new SomeObject('key', 'value'))
);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! What do you think about the new behaviour?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really have a strong preference, as I'm not the target audience of this feature. (Who would have expected after my Mastodon comments and comments in the GH discussion 😁). The new behavior appears to be reasonable, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha yeah, thank you for your participation though!

@romm
Copy link
Member Author

romm commented Aug 30, 2023

The object just needs to implement the native __serialize() method and add the needed logic in it.

As someone who worked quite a bit with PHP's native serialize() function, this is likely not a good idea, especially if __unserialize() is not also implemented. Those objects will fail to round-trip through the native serialize(). jsonSerialize() would be more reasonable.

Right. And after all, this is very easy to simulate the same behaviour with something like:

(new \CuyZ\Valinor\NormalizerBuilder())
    ->addHandler(function (object $object, callable $next) {
        return method_exists($object, 'normalize')
            ? $object->normalize()
            : $next();
    }); // …

Thanks for the suggestion.

@romm romm force-pushed the feat/normalizer branch from 822bc72 to 710af87 Compare August 30, 2023 17:46
@TimWolla
Copy link
Contributor

TimWolla commented Aug 31, 2023

Another gotcha that is likely not fixable, but should perhaps be stated as a limitation: Circular data structures cannot be serialized.

<?php

require('vendor/autoload.php');

final class Node
{
    public function __construct(
        /** @param list<Node> $neighbors */
        public array $neighbors = []
    ) {
    }
}

$n = new Node();
$n2 = new Node();
$n->neighbors[] = $n2;
$n2->neighbors[] = $n;

var_dump(
    (new \CuyZ\Valinor\NormalizerBuilder())
        ->normalizer()
        ->normalize($n)
);

Note that PHP's internal serialize() can handle this situation, that's why I've specifically mentioned that in the GH discussion as being capable of handling every object structure (except for stuff that is impossible to serialize, e.g. open file handles).

@romm
Copy link
Member Author

romm commented Sep 1, 2023

Another gotcha that is likely not fixable, but should perhaps be stated as a limitation: Circular data structures cannot be serialized.

Indeed, currently it will go in an infinite loop. There are two solutions I can think of right now:

  1. Provide a maximum depth limitation, however this can lead to data that is not valid for being re-mapped to.
  2. Detect cycle references and throw an exception when it's the case.

I'm not sure I could find a way to provide a solution that would work in every case. WDYT?

@TimWolla
Copy link
Contributor

TimWolla commented Sep 1, 2023

I'm not sure I could find a way to provide a solution that would work in every case. WDYT?

I would consider telling folks “don't do that” to be reasonable. Detecting cycles would likely require additional memory usage for something that shouldn't happen. Limiting the maximum depth would probably be fine if an error is emitted that the maximum depth is exceeded instead of returning partly serialized data.

@oprypkhantc
Copy link

Overall, this looks good.

Although this is a bit out of scope, you've mentioned this in your post so I'll reply here:

In some cases, it can seem cumbersome to implement the logic, but I strongly believe that in most cases this is trivial work that can be done quickly. More importantly, it can be adapted/improved directly by the developers of the application, following the constantly-changing needs of the business rules, without being tied up to a third-party library release-cycle.

Is there a way I, as an end-user of Valinor, can access reflection/attributes information in addHandler() callbacks? I understand the reasoning of why you don't want attributes to be part of the first-party API, but I don't agree that this work is trivial. That's fairly standard logic and I'd strongly prefer to use as little code as possible with as little random strings in the code as possible, which I believe the attributes are perfect for.

Again, to be clear, I'm not proposing/pushing attributes into Valinor itself; I'll be more than happy to simply implement them on my side. However, I don't see a mechanism that would allow me to do so; especially allow me to do so efficiently, i.e. without calling reflection / instantiating attributes on every serialization/deserialization.

Would this also be something that Valinor can introduce or has the decision been made?

private function doNormalize(mixed $value, array $references): mixed
{
if (is_object($value)) {
$id = spl_object_id($value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might consider using a WeakMap<object, true> instead. I feel this would be a little more explicit, especially since spl_object_id() might be reused in case the original object dies (for whatever reason) during normalization.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, thanks!

@romm
Copy link
Member Author

romm commented Nov 26, 2023

@oprypkhantc I've just pushed a new commit that makes it possible to use attributes. Documentation on it coming soon!

romm added 2 commits December 23, 2023 00:07
Better architecture for upcoming JSON normalizer
@romm romm marked this pull request as ready for review December 25, 2023 19:17
@romm romm merged commit 1c9368d into CuyZ:master Dec 26, 2023
14 checks passed
@romm romm deleted the feat/normalizer branch December 26, 2023 13:39
@romm
Copy link
Member Author

romm commented Dec 26, 2023

Hi @TimWolla, @oprypkhantc, thank you for your participation on this work! I'm going to prepare a release for it. 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants