Skip to content
This repository has been archived by the owner on Jul 25, 2024. It is now read-only.

PURGE requests should be non-blocking & Issues with redirection/302/WP HTTP API #56

Open
jerclarke opened this issue Jul 11, 2018 · 7 comments

Comments

@jerclarke
Copy link
Contributor

This was hell to debug. I discovered it because on my local system, all the PURGE calls were going through as actual GET pageloads, and my macbook was getting overwhelmed loading dozens of pages all at once each time I saved a published post (took over a minute).

TL;DR: If the current web server redirects PURGE requests as 302 then they become GET and load the actual URLs on the site, which makes saving a post extremely slow.

TL;DR2: The HTTP API allows requests to be non-blocking and thus not hold up the pageload, why not use 'blocking' => 0 this for this plugin?

It was slow because all the PURGEs were being done as GET

This part is mostly for information purposes, in case someone else stumbles onto a similar problem, but it also explains my situation and why I think the 'blocking' => 0 solution described below is important.

  • We had the plugin active on our local sites and want to keep it that way so that the live and dev environments are as similar as possible.
  • We don't have Varnish locally for predictable reasons, so obviously the PURGE requests don't "work" like they should, but we want them to fail silently.
  • The plugin sends the PURGE requests for the dozen relevant URLs each time.
  • WP ends up loading them as GET, which loads the actual URLs and overwhelms my laptop as a little mini-DOS, making post saving take 1min+

It has taken me literally years to figure out that this is what's happening, which is embarassing but related to how deep the problem lies. I had to dig into the logs and add logging deep inside the HTTP API methods to figure it out. The user experience was just that saving was excruciatingly slow on my local site.

After investigation the following became clear:

  • It's because each PURGE request was being redirected from http->https, and the redirected URL was being fetched as GET instead of PURGE
  • The reason for this was that the redirect returned by Apache was a 302, and WP has an explicit (underdocumented) internal filter on the HTTP API that converts ANY 302 redirect to GET regardless of it's starting method. You can read about how this happens on a StackExchange post I created about what's going on: Why does WP HTTP API switch the method (POST/PURGE) to GET when redirecting (302)?

Important notes:

  • This was on my MAMP setup, where Apache was running things and which seems to be related to the 302 coming up.
  • Our server, running Nginx, didn't have a similar behavior because it returned 301 errors in this same situation. So even PURGE requests could redirect successfully.

SO: In terms of this HTTP API confusion with redirection anyone with similar problems has a couple of options:

  • Rework your system so the redirects don't happen (in our case, filter the URLs to be HTTPS from the start)
  • Rework your system to make sure such redirects are 301, thus not triggering the WP filter in WP_Http->browser_redirect_compatibility()

Bad solution: In purge_url, pass 'redirection' = 0 to wp_remote_request()

i.e.
	$response = wp_remote_request( $purgeme, array(
			'method'  => 'PURGE',
			'headers' => $headers,
			'redirection' => 0,
		) );

This solved my problem with local dev because it means that when my Apache in MAMP gets the purge request and redirects it to HTTPS, it just silently stops.

The problem with this is that if the redirection is a 301, it DOES work! We discovered that our actual live server was relying on this for proper functioning. To me, this implies that this plugin shouldn't disable redirection, because there may be many users relying on the effectiveness of redirected PURGE calls, even if in the grander scheme they aren't reliable (and really, everyone should work to make sure their purges are happening without redirection).

That said, I wanted to point this out because for some cases it could be a good workaround, and it seemed like a good option to me at first.

Better solution? Use 'blocking' => 0 instead

By default all WP HTTP API requests are "blocking" and the whole system waits for the resulting $response object to be generated and returned. Obviously sometimes this is vital, but in the case of these PURGE requests I think the 'blocking' => 0 mode would be a really good fit:

  • In a live environment, we would pretty much never want the page saving to hang and wait for slow PURGE requests to process. It's bad UX for it to be slow, could cause race conditions and if the PURGE requests don't go well there still won't be any information about it for the user.
  • The plugin doesn't seem to do any material actions on the $response object at the end of purge_url(), it runs the after_purge_url filter but that filter isn't called inside the plugin itself.
  • For situations like mine where Varnish wasn't present, it avoids any messiness with the request causing chaos with the performance.

Have you considered 'blocking' => 0 before? Maybe there's a reason not to use it that I haven't thought of?

The main reason I can see is to preserve the current functioning of the filter:

	do_action( 'after_purge_url', $url, $purgeme, $response, $headers );

If the non-blocking mode was used, there wouldn't be a $response object to include in that filter (or, if left as-is, it would always be a blank default response object).

That said, it strikes me that the $response object, though useful here, is something we can probably live without as long as the documentation makes it clear that the $response is always empty because we (logically) use the non-blocking mode for the request.

We can still get the response with the http_api_debug action in WP_Http->request()

do_action( 'http_api_debug', $response, 'response', 'Requests', $r, $url );

So IDK if this is something you'd want to build into the plugin, but IMHO if making it easy to debug the purging is a priority, and otherwise 'blocking' => 0 makes sense to you, then you can easily document how to debug these purges and/or add it as a plugin feature using the action above:

  • Add a code example in the help docs for debuging purges that outputs an error_log() during http_api_debug any time the method in $r (request arguments) is PURGE
  • Add that code yourself to this plugin, then use a constant (`VHP_DEBUG_PURGES) to enable the logging.

This would work just as well as the existing hook, and now that I think of it, you could keep after_purge_url intact if you wanted, by creating your own hook on http_api_debug and setting up the variables so that after_purge_url is backwards-compatible.

So after all that, the question is: Why not use 'blocking' => 0 as the default mode for the plugin?

On my end, despite all this digging, it seems it's still pretty slow on local, even if I set it to 'blocking' => 0, because MAMP still loads every page and it takes 18 seconds reliably before I get back to the refreshed post editor. If I want to keep this plugin active on my dev environment I think i'll need to set up something that quickly and silently returns 200 for PURGE requests to mimic Varnish.

@Ipstenu
Copy link
Owner

Ipstenu commented Jul 12, 2018

Why not use 'blocking' => 0 as the default mode for the plugin?

Quick answer... Believe it or not, it's never come up. I test on a local box without Varnish to make sure the non purge things work (like the scanner and the debugger and admin stuff). Then I push it to a dev box with Varnish to test Varnish.

But this is a legit thing and it's fascinating to me that it hasn't come up before, so I'm going to spend some serious time reading this a couple more times and thinking about what to do with it.

My gut likes the idea of a VHP_DEBUG_PURGES define, but I'm not sure which is more useful to the majority since ... well. Most people just install and walk away. They don't really do the loopy things like trigger page A's flush when page B updates (like .. I do).

This needs serious thinking. I like the problem and the possible solutions you've found, Jer!

@danielbachhuber
Copy link
Collaborator

So after all that, the question is: Why not use 'blocking' => 0 as the default mode for the plugin?

One thing worth noting: 'blocking' => 0 only works on a limited set of systems. I only recall this anecdotally and would need to do more research to identify the specific contexts.

@jerclarke
Copy link
Contributor Author

Awesome thanks @Ipstenu ! So glad it wasn't just a confusing mess 😅

I think the logging stuff would be a big help to a lot of people, and if nothing else would be good to have for yourself it sounds like. Overall I think that yeah, a lot of people aren't managing their flushes carefully, but really most people should be 🤷🏻‍♀️

@danielbachhuber Yes, it seems on my local MAMP setup, blocking = > 0 has no meaningful effect on the timing, and seems to still make the post save wait.

I'm still in the process of testing it on our live server ([Ubuntu]Nginx->Varnish->Nginx)

@gerdneuman
Copy link

@danielbachhuber I guess even if blocking => false does not work on all systems (I cannot find a reference for it, too, so are you sure?) it would probably not hurt to adding it after I read https://codex.wordpress.org/HTTP_API#Other_Arguments and https://developer.wordpress.org/reference/classes/WP_Http/request/ (search for blocking in it). As @jerclarke said: The response is not used anyway.

@jerclarke any results from testing on the linux live server? Have you ever measure doing a time curl -X PURGE <url> to see how long the request need in the first place? I guess php would be similar and on localhost it is probably far less then if network latency is involved.

@danielbachhuber
Copy link
Collaborator

I guess even if blocking => false does not work on all systems (I cannot find a reference for it, too, so are you sure?)

This was something Barry told me when I attempted to implement usage analytics for WP-CLI wp-cli/wp-cli#3886

I don't know the underlying implementation details off the top of my head. If I were to look into it, I'd start with WordPress/Requests#53 and understanding how cURL and fsockopen implement blocking => false (whether they actually spawn separate processes, or it remains blocking simply because PHP is single-threaded).

it would probably not hurt to adding it after I read

It would be nice to have documented, I agree. We'd need to make sure the documentation is actually correct though.

@gerdneuman
Copy link

gerdneuman commented Aug 16, 2018

@danielbachhuber Thanks, I see.

Via a Stackoverflow answer, I've came across Guzzle which is supposed to support async http requests aka Futures / Promises according to its FAQ. So I guess this is something that might be worth looking into instead of the wp_remote_request.

EDIT:

Well, seems like it's not that simple :-/ after reading:
guzzle/guzzle#1127
guzzle/guzzle#1425
guzzle/guzzle#1429 (comment)

@gerdneuman
Copy link

FWIW, I added blocking => false now on our server (which is Gandi.net's Simple Hosting platform). They use latest Debian GNU/Linux with PHP 7. And the request is still… blocking :-(

So despite some discussions in #5 (comment) and #66 it looks like it's definitely not easy to solve. Also even if it was I'd like to mention (as nobody has yet) that many hosters have restrictive settings for the number of parallel php processes aka max_children php setting. For instance, for Gandi.net size M it is 4, for L it is 8. So if you'd PURGE'ed 8 URLs on parallel then you'd DOSed your server.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants