Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better communicate failure to search-replace JSON-serialized URLs #45

Open
danielbachhuber opened this issue Nov 17, 2017 · 12 comments
Open

Comments

@danielbachhuber
Copy link
Member

From Post Status Slack:

javorszky [5 hours ago] Gravity forms confirmation: goes to https://livesite.com/confirmation-page
I'm on dev. Dev is https://site.test
`wp search-replace https://livesites.com https://site.text`
It does not turn gravity form's form confirmation setting
javorszky [5 hours ago] because the setting is stored in the `rg_form_meta` table in the `confirmations` column as `json`, so the url will look like
javorszky [5 hours ago] `https:\/\/livesite.com\/confirmation-page`

We should better communicate this scenario in our documentation. In fact, it may be time for a dedicated document on search-replacing URLs in the database. Or even, create a dedicated wp search-replace url command, because replacing URLs is such a common use case.

cc @javorszky

@javorszky
Copy link

Hm, would it be something that could be actually replaced by the command? I know it already handles serialised data, so (famous last words incoming), how hard would it be to replace the json one?

The good thing is that it doesn't depend on string length, so theoretically both the from and to can be escaped and replaced again. Though that would mean twice as long pass along the entire dbase.

@danielbachhuber
Copy link
Member Author

I know it already handles serialised data, so (famous last words incoming), how hard would it be to replace the json one?

With the current command implementation, you'd need to use regex to also search-replace slashed URLs.

With this being said, if you're only replacing the domain (which it looks like you are in your example), you could drop the https:// from both strings and solely search-replace the domain.

@javorszky
Copy link

This just occurred to me:

you could drop the https:// from both strings and solely search-replace the domain.

It will break accounts if I'm signing in via email, because without the protocol, I'm also replacing livesite.com to staging.com in here: [email protected] -> [email protected].

@javorszky
Copy link

however during the search-replace loop we could check whether the data is json by checking whether it successfully decoded the data and then looping over and setting a json flag, so decode -> replace -> recode -> store

@Pushplaybang
Copy link

@javorszky - the user table is a good edge case, and one I think most developers forget about, it'd be great to specifically warn when affecting user data, though that might be out of scope for this issue.

@devkinetic
Copy link

devkinetic commented Nov 15, 2019

UPDATED 12/5/19

From my testing, to properly handle changing a url, you need more information than just the TLD. To your point @danielbachhuber, with the right regex that matches every way a url can be (within reason) it can (mostly) be handled all at once.

DOMAIN=site.com
NEW_DOMAIN=newsite.com
PREFIX=http

wp search-replace --all-tables --regex "https?(:?(%3A%2F%2F)?(\\\/|\/)*)(www\.)?$DOMAIN" "$PREFIX\$1$NEW_DOMAIN"
  • Match against http and https
  • Match every way :// can be stored in the database. This is based on the sites I've personally come across, and could be expanded. Whatever this ends up being is stored in $1 for use later
    • url encoded
    • escaped slashes
    • unescaped slashes
  • Match with or without www
  • Match the searched TLD

In the replace portion I reconstruct the url using:

  • Normalized prefix
  • The matched encoding of ://
  • The new domain

@javorszky I then run a follow up command to replace any instance of the domain which isn't an email address!

wp search-replace --all-tables --regex "(?<!@|%40)$DOMAIN" "$NEW_DOMAIN"

These two commands need to be run for each url on the website, so that means if multisite is enabled, it must be run with those urls as well:

DOMAIN=multisite.newsite.com
# first command
# second command

Of course now after reading this thread I now also have to think about user data as well! (thanks @Pushplaybang) I really welcome any feedback and hopefully this can become an easier process through some logic additions to the search-replace command.

@flaviovs
Copy link

We should be mindful when talking about URL vs general text search&replace. I guess the former can be done with proper escaping (although I suspect it is fragile and may break with some character combinations), while the latter may not be always possible. For example, replacing with strings containing quotes or newlines will render JSON strings invalid.

Since the purpose of the command is to do general string S&R, I fail to see how it can be done other than identifying a JSON string, decoding it, doing S&R in the array, and encoding it back.

@MehbubRashid
Copy link

facing same issue. following

@planetahuevo
Copy link

planetahuevo commented Oct 6, 2022

@devkinetic
Thank you for the commands.
I was running those, but on multiple search and replace commands.
Will the regex option be faster than executing the commands separately?

Edit:
I think this could be the answer:
[--regex]
Runs the search using a regular expression (without delimiters). Warning: search-replace will take about 15-20x longer when using –regex.
from https://developer.wordpress.org/cli/commands/search-replace/
Thanks

@mrsdizzie
Copy link
Member

Stumbled across this and wanted to say the regex above can be very slow for large datasets (as it warns).

Over hundreds of sites I've never had a problem always using the following 4 commands to change a URL everywhere:

wp search-replace '://www.oldsite.com' '://www.newsite.com'
wp search-replace '://oldsite.com' '://newsite.com'

wp search-replace ':\/\/oldsite.com' ':\/\/newsite.com'
wp search-replace ':\/\/www.oldsite.com' ':\/\/www.newsite.com'

If using PHP, you could:

$replacements = array(
  "://oldsite.com"  => "://newsite.com",
  "://www.oldsite.com" => "://www.newsite.com",
);

foreach ( $replacements as $search => $replace ) {
  WP_CLI::runcommand( "search-replace '$search' '$replace'");
  WP_CLI::runcommand( addcslashes( "search-replace '$search' '$replace'", '/' ) );
}

I've found there is often a mix of www and non-www versions so I check for both and just automate the commands above. Can adjust slightly if using a subdomain

@devkinetic
Copy link

Definitely comes down to use case. In my dealings I had to account for mixed content issues, it wasn't just moving a project from one url to another, it was normalizing as well as updating, so something comprehensive. I think that at times the detection of serialized data was also fragile in and of itself which may be separate issue entirely.

At the time, I had many WordPress projects I was inheriting in various states very quickly and didn't desire to debug every edge case that cropped up. It may be that you only need to use the regex version once and then use faster methods since you trust the database and plugins in use.

I am glad my commands helped, it shows that this is a consistent issue, and hopefully I saved some of your time. It would be so much better if WordPress just stopped using absolute links but it's just how things are I guess.

@danielbachhuber
Copy link
Member Author

danielbachhuber commented Jul 14, 2023

Or even, create a dedicated wp search-replace url command, because replacing URLs is such a common use case.

Created #186 for discussing a dedicated command

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants