Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate weird serialization logic #277

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

tillkruss
Copy link
Member

@tillkruss tillkruss commented Sep 11, 2020

My best guess is that this is never called. Thoughts @naxvog?

In 3.0 we should re-think the serialization.

@tillkruss tillkruss added the bug label Sep 11, 2020
@tillkruss tillkruss self-assigned this Sep 11, 2020
@tillkruss tillkruss added enhancement and removed bug labels Sep 11, 2020
@naxvog
Copy link
Collaborator

naxvog commented Sep 11, 2020

Just to summarize our slack conversation:

  • The original change was made years ago
  • The last condition of maybe_serialize serializes an already serialized string (nonsense)
  • All non strings should be serialized

We agreed on taking no further action at the moment in 2.x but will refactor the method in a future 3.0 release in order to maintain compatibility.

return $data;

// phpcs:ignore WordPress.PHP.DiscouragedPHPFunctions.serialize_serialize
return serialize( $data );
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always serialize, except if it already was serialized? @naxvog thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Redis currently contain double serialized data? If so, what were the use cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<?php
class Naxvog_Test {
        private $creation_time;
        private $serialization_time;
        public $some_prop;
        private $some_secret_prop;
        public function __construct() {
                $this->creation_time = time();
                $this->some_secret_prop = 'baz';
                $this->some_prop = 'bar';
        }
        public function __serialize() {
                return [
                        'serialization_time' => time(),
                        'creation_time' => $this->creation_time,
                ];
        }
        public function __unserialize( $data ) {
                $this->serialization_time = $data['serialization_time'];
                $this->creation_time = $data['creation_time'];
        }
}
add_action( 'init', function() {
        $a = new Naxvog_Test();
        $s = serialize( $a );
        #var_dump( $a, $s );
        #var_dump( unserialize( $s ) );
        if ( isset( $_GET['debug-set'] ) ) {
                wp_cache_set( 'naxvog_test_obj1', $a );
                wp_cache_set( 'naxvog_test_obj2', $s );
        }
        if ( isset( $_GET['debug-get'] ) ) {
                var_dump( wp_cache_get( 'naxvog_test_obj1' ) );
                var_dump( wp_cache_get( 'naxvog_test_obj2' ) );
        }
} );

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check is_serialized() twice in maybe_unserialize()?

Copy link
Collaborator

@naxvog naxvog Sep 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather prefer a conversion script running on plugin update resolving such issues but this might be a non trivial approach, would most likely require a LUA script and will take some time (best suited for action scheduler).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of any valid way to serialize twice. Sure there might be serialized data within a serialized object but this is expected behaviour.

Have to look again but I'm fairly sure that I have not found any double serialized key our docker dev environment. Will have a look on my production redis instance. Should be fairly easy to find.

Copy link
Collaborator

@naxvog naxvog Sep 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK found some instances (ran vim searching for [[:cntrl:]]s:[[:digit:]]\+:"O on a copy of the rdb database):

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we simply solve this by running this twice?

// Don't attempt to unserialize data that wasn't serialized going in.
if ( $this->is_serialized( $original ) ) {
// phpcs:ignore WordPress.PHP.NoSilencedErrors.Discouraged, WordPress.PHP.DiscouragedPHPFunctions.serialize_unserialize
$value = @unserialize( $original );
return is_object( $value ) ? clone $value : $value;
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have to look in the code of those plugins to confirm that this is not intended. If not we should find the cause for this double serialization instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running unserialize() twice seems risky, in case plugins do their own serialization.

@tillkruss
Copy link
Member Author

@naxvog Thoughts on the latest push?

if ( is_string( $value ) && $this->is_serialized( $value ) ) {
$value = @unserialize( $original );
}

return is_object( $value ) ? clone $value : $value;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we cloning the unserialized object right away?

if ( $this->is_serialized( $original ) ) {
// phpcs:ignore WordPress.PHP.NoSilencedErrors.Discouraged, WordPress.PHP.DiscouragedPHPFunctions.serialize_unserialize
$value = @unserialize( $original );

// Just in case the data was serialized twice
if ( is_string( $value ) && $this->is_serialized( $value ) ) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_string test is the first test in the is_serialized method - we can drop the first condition.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference the WordPress maybe_serialize function is as follows:

function maybe_serialize( $data ) {
    if ( is_array( $data ) || is_object( $data ) ) {
        return serialize( $data );
    }
 
    /*
     * Double serialization is required for backward compatibility.
     * See https://core.trac.wordpress.org/ticket/12930
     * Also the world will end. See WP 3.6.1.
     */
    if ( is_serialized( $data, false ) ) {
        return serialize( $data );
    }
 
    return $data;
}

Don't think we should follow the backward compatibility discussed in https://core.trac.wordpress.org/ticket/12930

@tillkruss
Copy link
Member Author

tillkruss commented Sep 29, 2020

Alright:

  • We're now flush the cache right after upgrading to 2.0.16, to flush weird double serialized data
  • All data going into the object cache is serialized (strings, booleans, etc.)
  • All data going out of the cache is unserialized once

@naxvog: I haven't tested any of this, but I'm happy with the overall approch

*
* @return void
*/
public function run_migrations() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To simplify things we could just flush the cache every time the plugin is updated.

Alternatively we should avoid polluting the options table with version specific entries - a general roc_version storing the last known version would be better. Migrations would only run if the current plugin version is newer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed 👌🏻

@tillkruss tillkruss removed their assignment Apr 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants