You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Species pages on australian.museum include an embedded interactive species distribution map that comes from biocache. We've seen on occasion these pages generate over 100,000 requests to biocache in under 30 minutes, sometimes from only a small number of hosts. We actually seem to handle it pretty well, but it is a lot of requests to service and significantly higher than our baseline.
I'm not sure if there's a reason behind these settings or if it's a default. We could reduce the load on our infrastructure and get better performance for users if we allowed caching of the tiles, we'd also save on traffic and serving costs. It seems reasonable to cache for at least a day. Browsers and any other intermediate caches can then keep a copy. I'll put in another ticket to get CloudFront in front of Biocache which will help with this and other cachable queries.
The text was updated successfully, but these errors were encountered:
joe-lipson
changed the title
Add cache headers for biocache.ala.org.au/ws/ogc/wms/reflect queries
Prevent large traffic spikes hitting biocache origin
Aug 8, 2024
Caching is active at SOLR, so the underlying repeated queries should be quick.
Client caching would be low impact.
Proxy caching (e.g. the previously used nginx cache) would be higher impact.
This is most beneficial when users are requesting the same information, e.g. looking at the same queries with the same map pans and zooms and point style.
Known issues with caching that would need resolution. For a current example, the hubs data quality caching is occasionally reported as it produces inconsistent responses. These are more a problem for long term (1hr) client caching than proxy caching that can be cleared on a trigger.
Index swap
Annotations
Live index updates
Targeted proxy cache is probably the most appropriate. I expect the triggered cache clearing will be the most complex component.
Species pages on australian.museum include an embedded interactive species distribution map that comes from biocache. We've seen on occasion these pages generate over 100,000 requests to biocache in under 30 minutes, sometimes from only a small number of hosts. We actually seem to handle it pretty well, but it is a lot of requests to service and significantly higher than our baseline.
Example museum page: https://australian.museum/learn/animals/mammals/bare-nosed-wombat/
It's easy to generate large numbers of requests by zooming and panning the map. they are all of the form:
https://biocache.ala.org.au/ws/ogc/wms/reflect?BBOX=17112110.396258976,-3355891.289832378,17121894.33587948,-3346107.3502118755&q=lsid%3Aurn%3Alsid%3Abiodiversity.org.au%3Aafd.taxon%3A66d42847-c556-4fa3-902c-a91d9f517286&SERVICE=WMS&REQUEST=GetMap&VERSION=1.1.1&SRS=EPSG%3A3857&ATTRIBUTION=Atlas+of+Living+Australia&FORMAT=image%2Fpng&BGCOLOR=0x000000&TRANSPARENT=true&ENV=color%3Ae6704c%3Bname%3Acircle%3Bsize%3A4%3Bopacity%3A0.8&OUTLINE=false&WIDTH=256&HEIGHT=256
And return a PNG map tile.
The response headers are aggressive about never caching a response
I'm not sure if there's a reason behind these settings or if it's a default. We could reduce the load on our infrastructure and get better performance for users if we allowed caching of the tiles, we'd also save on traffic and serving costs. It seems reasonable to cache for at least a day. Browsers and any other intermediate caches can then keep a copy. I'll put in another ticket to get CloudFront in front of Biocache which will help with this and other cachable queries.
https://github.com/AtlasOfLivingAustralia/ala-infrastructure/issues/1197
The text was updated successfully, but these errors were encountered: