Skip to content
This repository has been archived by the owner on Jul 25, 2024. It is now read-only.

Rest API archives / partial regex purge #96

Open
lukasbesch opened this issue Mar 25, 2022 · 2 comments
Open

Rest API archives / partial regex purge #96

lukasbesch opened this issue Mar 25, 2022 · 2 comments

Comments

@lukasbesch
Copy link

We are looking into Varnish + this plugin to optimize our caching strategy – especially the invalidation based on Regex.
In addition to the website it self there is a mobile app which heavily uses the WordPress REST API.
The users are able to filter, sort or search using the /wp-json/wp/v2/posts endpoint.
So if a post is created or updated, we need to purge not only the actual permalink and the id-specific endpoint, but also the archive with all possible query parameters and combinations:

URLs to purge
  1. https://site.com/the-post-name
  2. https://site.com/wp-json/wp/v2/posts/:id
    including:
    a. https://site.com/wp-json/wp/v2/posts/:id?
    b. https://site.com/wp-json/wp/v2/posts/:id/
    c. https://site.com/wp-json/wp/v2/posts/:id/?
    d. https://site.com/wp-json/wp/v2/posts/:id?_embed and all other query parameters
    e. https://site.com/wp-json/wp/v2/posts/:id/?_embed and all other query parameters (fields etc)
    but not:
    f. https://site.com/wp-json/wp/v2/posts/:anotherId (so other posts)
  3. https://site.com/wp-json/wp/v2/posts
    including:
    a. https://site.com/wp-json/wp/v2/posts?
    b. https://site.com/wp-json/wp/v2/posts/
    c. https://site.com/wp-json/wp/v2/posts/?
    d. https://site.com/wp-json/wp/v2/posts?_embed&orderby=date and all other query parameters
    e. https://site.com/wp-json/wp/v2/posts/?_embed&orderby=date and all other query parameters

(i think some urls are redundant because trailing slashes or empty query strings will be removed)


Of course, this should happen for every post type that has a rest_base defined and is public.
I tested a regex to play around.
Similarly this applies the taxonomy archives as well (when a term is created or updated).

As far as I understand, this plugin does not clear the post archive (REST-API) but only the single post.
Currently, the regex purging is only used for a full purge.
But it would be a good solution because for example search terms are unpredictable.

One solution is to hook into vhp_purge_urls, and possibly add the required urls too.
But we need to use some regex to match everything.
Maybe we can use the vhp-regex query parameter and assign a value with the regex (this would require possibly breaking changes to the plugin and a customized VCL file).
So that if a PURGE request contains a vhp-regex query parameter (even better use a header for this?) this is used instead of the requests url.

Is this something more people are interested in, or did someone approach this task?

I see some comments that somebody thought about it before :)

@Ipstenu
Copy link
Owner

Ipstenu commented Mar 25, 2022

Breaking changes that require VCL changes are something really to be avoided. There's no way to communicate to the right people, since the plugin users aren't always the varnish admins :( That's why it's a comment and something I mess with but haven't yet stepped fully into.

It would need to be a pure WP solution to identify what should be flushed.

@lukasbesch
Copy link
Author

lukasbesch commented Mar 30, 2022

@Ipstenu I understand that not everybody is able to change their VCL.
My approach in #97 is to use the same URL as before, but with two additional headers X-Purge-Method: ban-regex and X-Ban-Regex: the-regex. If the VCL supports these headers, it will be purged using the defined regex, otherwise the request URL is used as before. Does that work for you?

Example call for a REST-API index:

curl \
  -X PURGE \
  -H "X-Purge-Method: ban-regex" \
  -H "X-Ban-Regex: ^/wp-json/wp/v2/posts($|/$|\?.*|/\?.*)" \
  -D \
  – \
  "https://www.site.com/wp-json/wp/v2/posts/"

This method could possibly be used for other endpoints too (/wp-json/wp/v2/search, taxonomies but maybe also non-API URLs).

It would need to be a pure WP solution to identify what should be flushed.

This can be really unpredictable (in terms of query arguments). We are using custom taxonomies and users can filter by them, so the amount of query arguments and their combinations is huge or infinite (e.g. for the search parameter). But of course we do not want to purge the entire cache everytime.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants