Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent large traffic spikes hitting biocache origin #910

Open
joe-lipson opened this issue Aug 8, 2024 · 1 comment
Open

Prevent large traffic spikes hitting biocache origin #910

joe-lipson opened this issue Aug 8, 2024 · 1 comment

Comments

@joe-lipson
Copy link

joe-lipson commented Aug 8, 2024

Species pages on australian.museum include an embedded interactive species distribution map that comes from biocache. We've seen on occasion these pages generate over 100,000 requests to biocache in under 30 minutes, sometimes from only a small number of hosts. We actually seem to handle it pretty well, but it is a lot of requests to service and significantly higher than our baseline.

Example museum page: https://australian.museum/learn/animals/mammals/bare-nosed-wombat/

It's easy to generate large numbers of requests by zooming and panning the map. they are all of the form:
https://biocache.ala.org.au/ws/ogc/wms/reflect?BBOX=17112110.396258976,-3355891.289832378,17121894.33587948,-3346107.3502118755&q=lsid%3Aurn%3Alsid%3Abiodiversity.org.au%3Aafd.taxon%3A66d42847-c556-4fa3-902c-a91d9f517286&SERVICE=WMS&REQUEST=GetMap&VERSION=1.1.1&SRS=EPSG%3A3857&ATTRIBUTION=Atlas+of+Living+Australia&FORMAT=image%2Fpng&BGCOLOR=0x000000&TRANSPARENT=true&ENV=color%3Ae6704c%3Bname%3Acircle%3Bsize%3A4%3Bopacity%3A0.8&OUTLINE=false&WIDTH=256&HEIGHT=256

And return a PNG map tile.

The response headers are aggressive about never caching a response

cache-control: no-cache, no-store, max-age=0, must-revalidate
pragma: no-cache

I'm not sure if there's a reason behind these settings or if it's a default. We could reduce the load on our infrastructure and get better performance for users if we allowed caching of the tiles, we'd also save on traffic and serving costs. It seems reasonable to cache for at least a day. Browsers and any other intermediate caches can then keep a copy. I'll put in another ticket to get CloudFront in front of Biocache which will help with this and other cachable queries.

https://github.com/AtlasOfLivingAustralia/ala-infrastructure/issues/1197

@joe-lipson joe-lipson changed the title Add cache headers for biocache.ala.org.au/ws/ogc/wms/reflect queries Prevent large traffic spikes hitting biocache origin Aug 8, 2024
@adam-collins
Copy link
Contributor

Some info:

  • Caching is active at SOLR, so the underlying repeated queries should be quick.
  • Client caching would be low impact.
  • Proxy caching (e.g. the previously used nginx cache) would be higher impact.
  • This is most beneficial when users are requesting the same information, e.g. looking at the same queries with the same map pans and zooms and point style.

Known issues with caching that would need resolution. For a current example, the hubs data quality caching is occasionally reported as it produces inconsistent responses. These are more a problem for long term (1hr) client caching than proxy caching that can be cleared on a trigger.

  • Index swap
  • Annotations
  • Live index updates

Targeted proxy cache is probably the most appropriate. I expect the triggered cache clearing will be the most complex component.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants