RFC: Cache support in Hydrogen #98
Replies: 0 comments 49 replies
-
I already really dislike |
Beta Was this translation helpful? Give feedback.
-
This sounds solid to me! For the things you wanted people to chime in on:
💯 aligned with using Speaking of millisecond vs. seconds, are millisecond timings common? (My hunch says no, why would you want to cache something for a few milliseconds, but I'm no caching expert 🤷♀️ )
😬 Would we be able to abstract away the song-and-dance needed to supporting POST requests? 🤔 What does the entrypoint look like now? (It is the |
Beta Was this translation helpful? Give feedback.
-
Oxygen's Cache API supports POST requests! You can pass // From https://github.com/Shopify/oxygen-sws/blob/main/runtime/js/cache_test.js
let req = new Request("https://example.com/__api", {
method: "POST",
headers: {
"Accept": "application/json",
"Content-Type": "application/graphql",
"X-BodyHash": "f2ca1bb6c7e907d06dafe4687e579fce76b37e4e93b7605022da52e6ccc26fd2",
"X-OtherHash": "1234555557e907d06dafe4687e579fce76b37e4e93b7605022da52e6ccc26fd2",
},
});
// ...snip
// Confirm that match returns undefined when the kv has not been cached.
let got = await cache.match(req, {ignoreMethod: true})
assertEqual(undefined, got); |
Beta Was this translation helpful? Give feedback.
-
Would it make sense to add support for If-None-Match or If-Modified-Since & friends? Similar to Rails: https://guides.rubyonrails.org/caching_with_rails.html#conditional-get-support. It could be an improvement over TTL if, for example, you know a page is costly to render and depends on a single product. You either a computed hash (and ETag/If-None-Match), or the last modified time of the project (If-Modified-Since) to only render the page if the product changed. |
Beta Was this translation helpful? Give feedback.
-
It's a little unclear to me how a developer can define the "cache key" for the request in the full page caching proposal. I am guessing that the "implicit" key is just the request URL; any other headers would make the cache basically equivalent to us hosting the browser cache for every storefront visitor. However, if we do this, I don't think the example of "don’t cache this page if it has a cart cookie" really behaves the way a developer would expect. It looks like it’s saying "if this request has a cart cookie, never serve a cached result; otherwise, cache it for 10 seconds", but if I understand the underlying cache implementation correctly, it actually means that the first request without a cart cookie will be cached for 10 seconds, and all subsequent requests to the same URL, even if they have a cart cookie, will be served from the cache. Please correct me if I am misunderstanding how the HTTP event handler actually works! Setting this issue aside, there's actually a lot of reasons why you may want to augment the cache key of the request. Maybe this wouldn't be common for cookies, but it is pretty common for other headers, like The annoying part of these cache key solutions is that they are generally static — Hydrogen can do much better since it can hook into each individual request before attempting to resolve a request from the cache, and it can normalize headers that would otherwise have too many possible values to be useful as cache keys (e.g., normalizing the accept-language header, picking only one or two cookies, etc). This is a big part of the value proposition for things like Cloudflare Workers and Cloudfront Functions. However, in the context of Hydrogen, I think that logic all needs to live outside of React-land, since you only enter into React-land for a request that is not served from the cache. |
Beta Was this translation helpful? Give feedback.
-
Full-page cache keysThere are cases where we would not want to store a full-page cache for a request, like when the customer has an active cart ID stored in a cookie. To navigate this requirement, Hydrogen will supply a cache key with smart defaults. Out of the box, the Hydrogen full-page cache will follow these rules:
Most developers will not need to customize the caching logic above. Remember: caching is disabled unless the developer opts into it with However, if an ambitious developer wants to customize the request cache key, they can export a function from their export function requestCacheKey({request}: {request: ServerComponentRequest}): Request {
// Example of constructing a unique URL to use for full-page caching.
// Developer has access to headers, cookies, etc.
const url = `${request.url}?accept-language=${request.headers.get(
'accept-language',
)}`;
return new Request(url, request);
} Again, this is only for advanced developers. This is a sharp knife, and we won't expose this as part of our starter template. |
Beta Was this translation helpful? Give feedback.
-
Using
|
Beta Was this translation helpful? Give feedback.
-
Finally got a chance to read through the thread, doc and catch up on the review. 😅 Some meta takeaways and questions... The nomenclature and capabilities strike me as confusing and inconsistent..
FWIW, the confusion here—for me, at least—stems from the fact that revalidate and freshness timestamps are different things:
Could we converge on shared language and syntax across both page and subrequest? We should be explicit about the difference between freshess vs revalidation. Can we revisit "don't cache in CF by default"
FWIW, this feels backwards. If the framework emits Cache-Control and other appropriate cache headers, any upstream cache (Varnish, Squid, ATS, ..., and any CDN provider worth its salt) will be able to leverage it, and with better results than caching-at-origin. That aside, purely from a performance perspective, we do want to push towards being as close to the user as possible. The challenge to above is setting correct headers. Cache-Control is great but it's a blunt tool and Vary can be a footgun for both CDNs and browsers (e.g. browser caches don't store multiple variants, only the latest one, but that's ~mostly OK for what we're discussing here). In particular, Vary signals that response should vary on the value of a particular header but doesn't say which parts of that value are salient — Vary: User-Agent is basically treated as don't cache. For CF, Cache-Tags solves this but it's (afaik?) a proprietary thing. So, my question is, could we revisit the design here and explore how we can leverage standard Cache-Control, Vary, stale-while-revalidate, to get our responses closer to the user? What's the invalidation story?It's not covered in the RFC. If I've set page-level and subrequest-level caching policies...
My intuition is that we can probably punt on (1) to start, but we should be (2) invalidating caches clean whenever a new build is pushed. This means some automation on our side to wipe our own and upstream caches — e.g. send purge requests to CF. |
Beta Was this translation helpful? Give feedback.
-
Remember that you can basically treat the Cache API as a simple key-value store. You can encode your key in the URL and store the value in the response body. It doesn't have to map to an actual request and response, but the API is designed to make that easier. If there is an associated request, you can always add to the cache key by adding URL query arguments to account for the lack of the request headers being used as part of the cache key. If a lot of data is relevant for the cache key, you can always store a hash of it in the cache key to avoid key length limits. |
Beta Was this translation helpful? Give feedback.
-
Reading this for the first time. A few thoughts:
|
Beta Was this translation helpful? Give feedback.
-
Update: 2021-10-26
stale-while-revalidate
for async regenerationUpdate: 2021-09-09
CacheStore
from the RFC. I think we're better off accepting a simpleCache
instance that adheres to the Cache API and paying the abstraction cost in Hydrogen library code instead. See this commentTL;DR
cache-control
headers set by the developer through a helpful API.Cache
instance provided by the hosting layer, tucked behind a simple abstraction.Here's what Full-page caching looks like in page server components:
Pretty simple! Behind the scenes, this:
Cache-Control
headers on the response objectHere's what Subrequest caching looks like in server components:
useShopQuery
, we accept acache
property which instructs Hydrogen how long to cache the results. We calculate the cache key automatically based on the input.useQuery
, we accept acache
object with cache-control values to instruct Hydrogen how long to cache the results.Cache primitives provided by Hosting platforms
As mentioned above, Hydrogen expects a
Cache
andinstance to be provided in the entrypoint script of Hydrogen.CacheStore
Cache
Cache
is an instance of Cache API which accepts aRequest
as a key and aResponse
as the payload. It only supports GET requests and is only used for full-page caching.This exists today in both Oxygen and Cloudflare Workers.
CacheStore
Update: This is no longer used.
CacheStore
is a bespoke KV store inspired by react-query:This cache store is very similar to Cloudflare's KV API with a few important (!) differences:
CacheStore
doesn't have to be globally replicated, and it's OK to be limited to each datacenter/colocationCacheStore
should be able to accept many reads and many writes (Cloudflare's KV is very limited on writes - ~1 per second)CacheStore
does not exist today in Oxygen OR Cloudflare. We have an opportunity to build this in Oxygen, and we could build a wrapper around Cloudflare's existingCache
API.Developers hosting Hydrogen elsewhere would need to implement their own versions of these primitives, backed by solutions offered by the hosting provider or their own e.g. Redis instance.
Background
First: A note on caching.
To quote Tobi:
Ideally, Hydrogen should provide the simplest approach possible to caching. This means having smart, opinionated defaults and putting sharp knives in drawers until they're needed.
Most developers shouldn't need to think about customizing their caching concerns short of the simple APIs we provide them. If we find that developers are needing to customize lots of things about the way caching works in Hydrogen, it means we've done something wrong — either at the framework level or the platform/API level.
This RFC reflects a simple API that covers most use cases for developers.
Caching is a powerful mechanism for modern applications. From improving response times to scaling during high traffic events, cache is used almost everywhere.
For commerce applications like Hydrogen, where products may go out of stock or flash sales bring spikes in traffic, having a configurable cache is incredibly important.
This RFC describes a proposal for the cache API as it relates to Hydrogen specifically, including which inputs we expect from a hosting runtime like Oxygen, what controls we expose to the developer, and more.
For the purposes of this example, let's focus on two distinct types of caching:
Full page cache
Full page cache is very effective in returning quick responses to mostly static (read: not dynamic) content.
There are typically three places in the request flow where full-page caching can happen:
The first and second layers of cache, Browser and CDN, are typically controlled using the
Cache-Control
HTTP Response header.The third layer of the cache, Origin, is controlled using whatever bespoke mechanism the origin server decides to use. This could be a generic TTL against a generic key, etc.
This RFC proposes that we leverage the first and second layer (Browser and Edge) to perform full-page caching.
This doesn't mean that we need to leave the first layer (Browser) out in the cold! We can still leverage
Cache-Control
headers to keep a version of the full-page response in the user's browser cache, for example.Examples
This is an example of a marketing page where the developer is absolutely confident they won't need to serve any dynamic content:
However, when there's a product on the page, a developer might want to cache the page for less time to ensure the page is less stale and displays more up-to-date product information:
Additionally, if dynamic or customized data is present, the developer needs to mark the cache response as
private
to ensure cache is only applied to the Browser layer, not to the CDN layer:Subrequest caching
While full-page cache is great and can lead to sub-100ms responses with zero API queries, commerce is a dynamic beast with lots of moving parts!
As soon as a customer adds an item to their cart, for instance, we cannot serve them a cached page because their data requirements are unique.
Short of including
cartId
etc in the full-page cache keys, this means we should expect a lot of full-page cache misses.To mitigate this, we leverage Subrequest caching to ensure API requests made within the context of a given page request return quickly and in a scalable manner.
Backed by react-query, Hydogen provides abstractions on the
useQuery
hook for both 1P Shopify queries and 3P fetch queries.A couple notes here:
useShopQuery
is a wrapper around Hydrogen's version ofuseQuery
, which is yet another wrapper around react-query's version ofuseQuery
useShopQuery
requests based on request body and headers/etc.key
and cache properties foruseQuery
custom 3P fetch calls. This is a sharp knife which allows the most flexibility for the developer, regardless of whether they're fetching a REST API, GraphQL API, making multiple chained requests and performing async operations on the results, etc.Important Notes
Why not...?
...just rely on network edge (Cloudflare) for full-page caching?
Update 10/26: We are now 😄
Rather than invent a new syntax or a new API surface for Hydrogen developers, why not lean into existing Web standards and use
Cache-Control
as the API for managing full-page cache in Hydrogen?.In this scenario, Hydrogen merely sets
Cache-Control
headers and doesn't fuss with reading from or writing to aCache
API.Unfortunately, Cloudflare does not cache HTML responses by default. Sure, we could flip a switch on Oxygen hosting to enable this, but this leaves developers self-hosting Hydrogen completely on the hook for implementing their own full-page cache and fronting their site with a CDN.
Shipping Hydrogen without a mechanism for basic full-page caching seems like a bad developer experience.
...use a single
Cache
instance for both Full-page caching and Subrequest caching?Update: I think we should use the same
Cache
instance, and I've updated the RFC accordingly. See Shopify/hydrogen#446 (comment)Update 2: We're only using this for sub-requests now.
This would make the entrypoint much simpler and would provide out-of-the-box support for both Oxygen and Cloudflare Workers.
However,Cache
does not support POST requests. Guess what all of Shopify GraphQL requests are? POST requests.The workaround requires a fancy song and dance. Since we're starting fresh, we'd like to avoid this in Hydrogen + Oxygen.... use
ttl
instead ofrevalidateSeconds
for subrequest options?Update: We're using neither. Instead, we're going for a more verbose and powerful (but potentially more confusing) API based on the cache-control header.
I prefer
revalidate
(like Next.js uses) because it indicates that the Cached data will be revalidated after a number of seconds rather than purged. This allows us to fetch new data behind the scenes while serving stale data.This is more of a semantic thing, so it's a weakly-held opinion.
... use
urql
or another existing GraphQL that has smart caching capabilities built-in?While it's true that Hydrogen developers will be performing GraphQL requests against the Shopify Storefront API, they won't just be making GraphQL requests.
We need to provide a way for REST APIs to be queried. We also need to allow developers to perform async operations on the results of fetch calls, like making follow-up requests or interacting with a 3rd party JS library. These all need to happen in a Suspense-capable callback function which can be rendered in React server components. This is why we're using
react-query
— because it's a one-size-fits-all approach.Comparison: Next.js
Next.js provides a really nice cache syntax to manage incremental static regeneration (ISR):
By allowing a developer to define a number of seconds until the data is revalidated, Next.js effectively supports full-page caching. All server-side data queries happen at once during
getStaticProps
and are cached together.This is the level of control for both full-page cache and subrequest cache that we're hoping to achieve with Hydrogen.
Beta Was this translation helpful? Give feedback.
All reactions