RFC: Cache support in Hydrogen #98

jplhomer · 2021-08-19T19:02:34Z

jplhomer
Aug 19, 2021
Collaborator

Update: 2021-10-26

I've updated to a new strategy involving a similar cache API for both full-page and sub-requests.
We're also implementing a default set of cache values behind the scenes
Finally, we're leaning into stale-while-revalidate for async regeneration

Update: 2021-09-09

I've removed the concept of CacheStore from the RFC. I think we're better off accepting a simple Cache instance that adheres to the Cache API and paying the abstraction cost in Hydrogen library code instead. See this comment
I've updated our strategy for full-page caching to acknowledge our cache key strategy

TL;DR

**Full-page caching is powered by the network edge (in Oxygen's case, Cloudflare) and is dictated by cache-control headers set by the developer through a helpful API.
Subrequest caching is powered by a Cache instance provided by the hosting layer, tucked behind a simple abstraction.

Here's what Full-page caching looks like in page server components:

export default function Page({ response }) {
  response.cache({
    maxAge: 60 * 10, // Cache response for 10 minutes
    staleWhileRevalidate: 60 * 60 // Allow stale responses to be served for one hour while a new response is asynchronously generated
  });

  return <p>I am a static marketing page</p>
}

Pretty simple! Behind the scenes, this:

sets Cache-Control headers on the response object

Here's what Subrequest caching looks like in server components:

// index.server.jsx
import { useShopQuery, useQuery } from "@shopify/hydrogen";

export default function Page() {
  const shopifyData = useShopQuery({
    query: QUERY,
    cache: {
      maxAge: 60, // Data is considered stale after 60s
      staleWhileRevalidate: 120 // Serve stale data for up to 120s while response is async regenerated
    }
  });

  const thirdPartyData = useQuery(
    'developer-provides-unique-key',
    async () => {
      // any Promise-based function that resolves to data
      return await fetch(CUSTOM_DOMAIN).then((r) => r.json());
    },
    {
      cache: {
        maxAge: 60, // Data is considered stale after 60s
        staleWhileRevalidate: 120 // Serve stale data for up to 120s while response is async regenerated
      },
    }
  );

  return (
    <SomeComponent data={shopifyData}>
      <AnotherComponent data={thirdPartyData} />
    </SomeComponent>
  );
}

When making Shopify Storefront API requests via useShopQuery, we accept a cache property which instructs Hydrogen how long to cache the results. We calculate the cache key automatically based on the input.
When making third-party generic queries via useQuery, we accept a cache object with cache-control values to instruct Hydrogen how long to cache the results.

Cache primitives provided by Hosting platforms

As mentioned above, Hydrogen expects a Cache ~~and CacheStore~~ instance to be provided in the entrypoint script of Hydrogen.

`Cache`

Cache is an instance of Cache API which accepts a Request as a key and a Response as the payload. It only supports GET requests and is only used for full-page caching.

This exists today in both Oxygen and Cloudflare Workers.

`CacheStore`

Update: This is no longer used.

CacheStore is a bespoke KV store inspired by react-query:

interface CacheStore {
  setItem(key: string, value: any) => Promise<void>;
  getItem(key: string) => Promise<any>;
  removeItem(key: string) => Promise<void>;
}

This cache store is very similar to Cloudflare's KV API with a few important (!) differences:

CacheStore doesn't have to be globally replicated, and it's OK to be limited to each datacenter/colocation
CacheStore should be able to accept many reads and many writes (Cloudflare's KV is very limited on writes - ~1 per second)

CacheStore does not exist today in Oxygen OR Cloudflare. We have an opportunity to build this in Oxygen, and we could build a wrapper around Cloudflare's existing Cache API.

Developers hosting Hydrogen elsewhere would need to implement their own versions of these primitives, backed by solutions offered by the hosting provider or their own e.g. Redis instance.

Background

First: A note on caching.

To quote Tobi:

[Caching] is a super super hard thing to reason about for most developers ... I found it very hard to teach this type of thinking and found very few developers who had a good intuition for it. I’d suggest we enable a way of doing it [key-based caching] but not ship our templates with it.

Ideally, Hydrogen should provide the simplest approach possible to caching. This means having smart, opinionated defaults and putting sharp knives in drawers until they're needed.

Most developers shouldn't need to think about customizing their caching concerns short of the simple APIs we provide them. If we find that developers are needing to customize lots of things about the way caching works in Hydrogen, it means we've done something wrong — either at the framework level or the platform/API level.

This RFC reflects a simple API that covers most use cases for developers.

Caching is a powerful mechanism for modern applications. From improving response times to scaling during high traffic events, cache is used almost everywhere.

For commerce applications like Hydrogen, where products may go out of stock or flash sales bring spikes in traffic, having a configurable cache is incredibly important.

This RFC describes a proposal for the cache API as it relates to Hydrogen specifically, including which inputs we expect from a hosting runtime like Oxygen, what controls we expose to the developer, and more.

For the purposes of this example, let's focus on two distinct types of caching:

Full page cache
Subrequest cache

Full page cache

Full page cache is very effective in returning quick responses to mostly static (read: not dynamic) content.

There are typically three places in the request flow where full-page caching can happen:

The first and second layers of cache, Browser and CDN, are typically controlled using the Cache-Control HTTP Response header.

The third layer of the cache, Origin, is controlled using whatever bespoke mechanism the origin server decides to use. This could be a generic TTL against a generic key, etc.

This RFC proposes that we leverage the first and second layer (Browser and Edge) to perform full-page caching.

This doesn't mean that we need to leave the first layer (Browser) out in the cold! We can still leverage Cache-Control headers to keep a version of the full-page response in the user's browser cache, for example.

Examples

This is an example of a marketing page where the developer is absolutely confident they won't need to serve any dynamic content:

export default function Page({ response }) {
  response.cache({
    maxAge: 60 * 10, // Cache response for 10 minutes
    staleWhileRevalidate: 60 * 60 // SWR for 60 minutes
  });

  return <p>I am a static marketing page</p>
}

However, when there's a product on the page, a developer might want to cache the page for less time to ensure the page is less stale and displays more up-to-date product information:

export default function Page({ response }) {
  response.cache({
    maxAge: 10, // Cache response for 10 seconds
    staleWhileRevalidate: 60 // SWR for 60 seconds
  });

  return <p>I am a dynamic product page</p>
}

Additionally, if dynamic or customized data is present, the developer needs to mark the cache response as private to ensure cache is only applied to the Browser layer, not to the CDN layer:

export default function Page({ response, customerData }) {
  response.cache({
    private: true, // Only cache in the browser, not edge
    maxAge: 10, // Cache response for 10 seconds
    staleWhileRevalidate: 60 // SWR for 60 seconds
  });

  return <p>I am a dynamic, customized product page: {customerData.name}</p>
}

Subrequest caching

While full-page cache is great and can lead to sub-100ms responses with zero API queries, commerce is a dynamic beast with lots of moving parts!

As soon as a customer adds an item to their cart, for instance, we cannot serve them a cached page because their data requirements are unique.

Short of including cartId etc in the full-page cache keys, this means we should expect a lot of full-page cache misses.

To mitigate this, we leverage Subrequest caching to ensure API requests made within the context of a given page request return quickly and in a scalable manner.

import { useShopQuery, useQuery } from "@shopify/hydrogen";

export default function Page() {
  const shopifyData = useShopQuery({
    query: QUERY,
  }, {
    cache: {
      maxAge: 60,
      staleWhileRevalidate: 60 * 10,
    }
  );

  const thirdPartyData = useQuery(
    'developer-provides-unique-key',
    async () => {
      // any Promise-based function that resolves to data
      return await fetch(CUSTOM_DOMAIN).then((r) => r.json());
    },
    {
      cache: {
        maxAge: 30, // Data is considered stale after 30s
        staleWhileRevalidate: 60,
      },
    }
  );

  return (
    <SomeComponent data={shopifyData}>
      <AnotherComponent data={thirdPartyData} />
    </SomeComponent>
  );
}

Backed by react-query, Hydogen provides abstractions on the useQuery hook for both 1P Shopify queries and 3P fetch queries.

A couple notes here:

useShopQuery is a wrapper around Hydrogen's version of useQuery, which is yet another wrapper around react-query's version of useQuery
Hydrogen will automatically generate a cache key for useShopQuery requests based on request body and headers/etc.
Developers will need to provide both a key and cache properties for useQuery custom 3P fetch calls. This is a sharp knife which allows the most flexibility for the developer, regardless of whether they're fetching a REST API, GraphQL API, making multiple chained requests and performing async operations on the results, etc.

Important Notes

Default to cache enabled within Hydrogen. We will provide defaults for full-page and sub-request cache. It is critical that developers understand this and that it does not become a footgun for accidentally caching private data.
Caching is TTL-based. Cache-Tag and/or Shop-Version caching is not included in this RFC, but could always be revisited if Shopify exposes a developer-facing mechanism to tag and purge cache entries by Cache-Tag. This work likely happens one layer down on SFAPI anyway.
We should not do any caching of Storefront API requests within Oxygen, as it could lead to inconsistent results. All control should live within the Hydrogen framework.

Why not...?

...just rely on network edge (Cloudflare) for full-page caching?

Update 10/26: We are now 😄

Rather than invent a new syntax or a new API surface for Hydrogen developers, why not lean into existing Web standards and use Cache-Control as the API for managing full-page cache in Hydrogen?.

In this scenario, Hydrogen merely sets Cache-Control headers and doesn't fuss with reading from or writing to a Cache API.

Unfortunately, Cloudflare does not cache HTML responses by default. Sure, we could flip a switch on Oxygen hosting to enable this, but this leaves developers self-hosting Hydrogen completely on the hook for implementing their own full-page cache and fronting their site with a CDN.

Shipping Hydrogen without a mechanism for basic full-page caching seems like a bad developer experience.

...use a single `Cache` instance for both Full-page caching and Subrequest caching?

Update: I think we should use the same Cache instance, and I've updated the RFC accordingly. See Shopify/hydrogen#446 (comment)

Update 2: We're only using this for sub-requests now.

This would make the entrypoint much simpler and would provide out-of-the-box support for both Oxygen and Cloudflare Workers.

~~However, Cache does not support POST requests. Guess what all of Shopify GraphQL requests are? POST requests.~~

~~The workaround requires a fancy song and dance. Since we're starting fresh, we'd like to avoid this in Hydrogen + Oxygen.~~

... use `ttl` instead of `revalidateSeconds` for subrequest options?

Update: We're using neither. Instead, we're going for a more verbose and powerful (but potentially more confusing) API based on the cache-control header.

I prefer revalidate (like Next.js uses) because it indicates that the Cached data will be revalidated after a number of seconds rather than purged. This allows us to fetch new data behind the scenes while serving stale data.

This is more of a semantic thing, so it's a weakly-held opinion.

⚠️ Please chime in on this

... use `urql` or another existing GraphQL that has smart caching capabilities built-in?

While it's true that Hydrogen developers will be performing GraphQL requests against the Shopify Storefront API, they won't just be making GraphQL requests.

We need to provide a way for REST APIs to be queried. We also need to allow developers to perform async operations on the results of fetch calls, like making follow-up requests or interacting with a 3rd party JS library. These all need to happen in a Suspense-capable callback function which can be rendered in React server components. This is why we're using react-query — because it's a one-size-fits-all approach.

Comparison: Next.js

Next.js provides a really nice cache syntax to manage incremental static regeneration (ISR):

export async function getStaticProps() {
  const res = await fetch('https://.../posts')
  const posts = await res.json()

  return {
    props: {
      posts,
    },
    // Next.js will attempt to re-generate the page:
    // - When a request comes in
    // - At most once every 10 seconds
    revalidate: 10, // In seconds
  }
}

By allowing a developer to define a number of seconds until the data is revalidated, Next.js effectively supports full-page caching. All server-side data queries happen at once during getStaticProps and are cached together.

This is the level of control for both full-page cache and subrequest cache that we're hoping to achieve with Hydrogen.

jplhomer · 2021-08-24T19:27:00Z

jplhomer
Aug 24, 2021
Collaborator Author

I already really dislike Cache and CacheStore as names. They are too similar. If we do end up requiring both of these inputs, what should we name them?

6 replies

cathryngriffiths Aug 25, 2021

Naming is hard 😅

Since the Cache is for page caching --> i like @frandiox 's suggestion of PageCache (or CachePage if we want to keep Cache emphasized)
Since CacheStore is for subrequest caching --> SubrequestCache or CacheSubrequest

I think those names make it immediately understandable what each cache is responsible for.

macournoyer Aug 25, 2021
Collaborator

Seems to me, reading the proposal, that CacheStore is more of a generic storage, why not name it Storage or KVStore?

jplhomer Aug 25, 2021
Collaborator Author

why not name it Storage or KVStore?

Good question. I avoided KVStore as to not cause confusion with Cloudflare's KV which is globally replicated, persisted and not write-heavy. Not a huge deal though if we decide that's the best option!

macournoyer Aug 25, 2021
Collaborator

I was proposing this only if CacheStore could be used for other stuff in the future, as you said it might be backed by Redis. Not sure that's the case... It would solve the issue of name clashing w/ Cache and make it clear that it can be used for other stuff.

LocalKVStore... OK I'll see myself out now 😄

jplhomer Sep 9, 2021
Collaborator Author

Update: I think we should scrap the second CacheStore primitive. See Shopify/hydrogen#446 (comment)

cathryngriffiths · 2021-08-25T13:58:10Z

cathryngriffiths
Aug 25, 2021

This sounds solid to me!

For the things you wanted people to chime in on:

use ttl instead of revalidateSeconds for subrequest options?
I prefer revalidate (like Next.js uses) because it indicates that the Cached data will be revalidated after a number of seconds rather than purged. This allows us to fetch new data behind the scenes while serving stale data.
This is more of a semantic thing, so it's a weakly-held opinion.

💯 aligned with using revalidateSeconds due to how descriptive/easily understood this is.
(Also, as someone who has done her fair share of millisecond vs. second mistakes, I appreciate the explicitness of the unit expected here).

Speaking of millisecond vs. seconds, are millisecond timings common? (My hunch says no, why would you want to cache something for a few milliseconds, but I'm no caching expert 🤷‍♀️ )

...use a single Cache instance for both Full-page caching and Subrequest caching?
This would make the entrypoint much simpler and would provide out-of-the-box support for both Oxygen and Cloudflare Workers.
However, Cache does not support POST requests. Guess what all of Shopify GraphQL requests are? POST requests.
The workaround requires a fancy song and dance. Since we're starting fresh, we'd like to avoid this in Hydrogen + Oxygen.

😬 Would we be able to abstract away the song-and-dance needed to supporting POST requests? 🤔 What does the entrypoint look like now? (It is the handleEvent(Cache, CacheStore)? If yes, it seems pretty simple already, so I'm not convinced that making it simpler is necessary 🤷‍♀️ )

1 reply

jplhomer Aug 25, 2021
Collaborator Author

Would we be able to abstract away the song-and-dance needed to supporting POST requests?

Absolutely! In fact, the way it sits now, the developer wouldn't likely interact with the Cache API at all — this is already abstracted by Hydrogen. So maybe this is much less of a deal than I envisioned. Maybe @igrigorik has thoughts 😄

According to @OFuerst, Oxygen actually supports caching POST requests: Shopify/hydrogen#446 (comment)

Could be a moot point! We'll see.

ghost · 2021-08-25T14:38:41Z

ghost
Aug 25, 2021

Oxygen's Cache API supports POST requests!

You can pass {ignoreMethod: true} to method calls on the Cache object.

  // From https://github.com/Shopify/oxygen-sws/blob/main/runtime/js/cache_test.js

  let req = new Request("https://example.com/__api", {
    method: "POST",
    headers: {
      "Accept": "application/json",
      "Content-Type": "application/graphql",
      "X-BodyHash": "f2ca1bb6c7e907d06dafe4687e579fce76b37e4e93b7605022da52e6ccc26fd2",
      "X-OtherHash": "1234555557e907d06dafe4687e579fce76b37e4e93b7605022da52e6ccc26fd2",
    },
  });

  // ...snip

  // Confirm that match returns undefined when the kv has not been cached.
  let got = await cache.match(req, {ignoreMethod: true})
  assertEqual(undefined, got);

5 replies

jplhomer Aug 25, 2021
Collaborator Author

🤯

OK so... this is cool.

Do we feel OK straying a bit from the Cache API spec on this? Is there a world where we contribute this upstream to the spec?

ghost Aug 25, 2021

Yes, because Cloudflare does it too. 🙂

macournoyer Aug 25, 2021
Collaborator

Could ETag be used instead of a custom X-BodyHash header?

ghost Aug 25, 2021

Currently SWS uses the Vary header for "keying" the cache requests. If you look at the full test, you'll see that those fields are passed in the request and checked in the cached response.

dylanahsmith Oct 19, 2021

ignoreMethod is already standard (https://developer.mozilla.org/en-US/docs/Web/API/Cache/match)

macournoyer · 2021-08-25T15:19:59Z

macournoyer
Aug 25, 2021
Collaborator

Would it make sense to add support for If-None-Match or If-Modified-Since & friends? Similar to Rails: https://guides.rubyonrails.org/caching_with_rails.html#conditional-get-support.

It could be an improvement over TTL if, for example, you know a page is costly to render and depends on a single product. You either a computed hash (and ETag/If-None-Match), or the last modified time of the project (If-Modified-Since) to only render the page if the product changed.

3 replies

jplhomer Aug 25, 2021
Collaborator Author

I like this.

Does it seem like something we could support behind the proposed abstraction, living alongside TTL?

Or are ETag/If-None-Match mutually exclusive with Cache-Control: max-age?

macournoyer Aug 25, 2021
Collaborator

I think using the two at once is optimal. You need to hit the app server for ETag/If-None-Match to work. The app needs to re-compute the etag on each request (or fetch a date for If-Modified-Since). While TTL can be handled at the edge/browser.

It would kind of sit between TTL page cache and subrequest caching.

jplhomer Sep 9, 2021
Collaborator Author

I think this is something we can totally do, and it can save on bytes if our server is able to respond with 304 Not Modified 👍

lemonmade · 2021-08-25T16:25:47Z

lemonmade
Aug 25, 2021
Collaborator

It's a little unclear to me how a developer can define the "cache key" for the request in the full page caching proposal. I am guessing that the "implicit" key is just the request URL; any other headers would make the cache basically equivalent to us hosting the browser cache for every storefront visitor. However, if we do this, I don't think the example of "don’t cache this page if it has a cart cookie" really behaves the way a developer would expect. It looks like it’s saying "if this request has a cart cookie, never serve a cached result; otherwise, cache it for 10 seconds", but if I understand the underlying cache implementation correctly, it actually means that the first request without a cart cookie will be cached for 10 seconds, and all subsequent requests to the same URL, even if they have a cart cookie, will be served from the cache. Please correct me if I am misunderstanding how the HTTP event handler actually works!

Setting this issue aside, there's actually a lot of reasons why you may want to augment the cache key of the request. Maybe this wouldn't be common for cookies, but it is pretty common for other headers, like Accept-Language, where you may want to serve different content for different languages. I think people sometimes also want to include subsets of query parameters on the page (e.g., for query params used as pagination). Most CDNs allow you to create "policies" that declare what headers (and, sometimes, cookies) are included in the cache key the CDN puts on responses (Cloudfront, Fastly, Cloudflare, ...).

The annoying part of these cache key solutions is that they are generally static — Hydrogen can do much better since it can hook into each individual request before attempting to resolve a request from the cache, and it can normalize headers that would otherwise have too many possible values to be useful as cache keys (e.g., normalizing the accept-language header, picking only one or two cookies, etc). This is a big part of the value proposition for things like Cloudflare Workers and Cloudfront Functions. However, in the context of Hydrogen, I think that logic all needs to live outside of React-land, since you only enter into React-land for a request that is not served from the cache.

4 replies

ghost Aug 25, 2021

Oxygen SWS' Cache API uses the Vary header to that effect but doesn't allow Vary: * which is the exact opposite of what Cloudflare does. See the runtime cache tests that make it clear what SWS supports.

lemonmade Aug 25, 2021
Collaborator

If a developer needs to use Vary to control the cache key, maybe that should be part of the cacheForSeconds() API, since any meaningfully complex usage of cacheForSeconds() will also need to indicate the headers to vary on?

I worry about using the response Vary header for this purpose because, if we are explicitly following the spec, it can only include header names, which makes it slightly more limited compared to common CDN features (which can generally vary by header and select cookies or query params) and much more limited compared to fully-dynamic solutions like Cloudflare Workers or Cloudfront Functions.

jplhomer Sep 7, 2021
Collaborator Author

Yeah you're absolutely right about all of these things, @lemonmade. I need to go back to the drawing board on this.

However, in the context of Hydrogen, I think that logic all needs to live outside of React-land, since you only enter into React-land for a request that is not served from the cache.

That in particular strikes home. I gotta wonder how we approach this. Adding a level of customizability here is gonna increase cognitive load more than a simple cacheForSeconds call. I wonder how other frameworks approach this!

jplhomer Sep 9, 2021
Collaborator Author

OK I updated my thoughts here: Shopify/hydrogen#446 (comment)

jplhomer · 2021-09-09T14:06:19Z

jplhomer
Sep 9, 2021
Collaborator Author

Full-page cache keys

There are cases where we would not want to store a full-page cache for a request, like when the customer has an active cart ID stored in a cookie.

To navigate this requirement, Hydrogen will supply a cache key with smart defaults.

Out of the box, the Hydrogen full-page cache will follow these rules:

Start with the baseline Request object as dictated by Cache.put()
Vary on the accept-language header. Because headers are ignored by default, and since Shopify's APIs return unique response data based on this header, we want to "vary" on this header to ensure we're not caching responses for the incorrect context.
Consider requests with CART_ID cookies ineligible for caching operations
Assume the same restrictions outlined in Cloudflare's Cache API, including:
- Any responses with Set-Cookie are not cached
- Headers (other than cache-related) are ignored

Most developers will not need to customize the caching logic above. Remember: caching is disabled unless the developer opts into it with request.cacheForSeconds().

However, if an ambitious developer wants to customize the request cache key, they can export a function from their entry-server.jsx:

export function requestCacheKey({request}: {request: ServerComponentRequest}): Request {
  // Example of constructing a unique URL to use for full-page caching.
  // Developer has access to headers, cookies, etc.
  const url = `${request.url}?accept-language=${request.headers.get(
    'accept-language',
  )}`;

  return new Request(url, request);
}

Again, this is only for advanced developers. This is a sharp knife, and we won't expose this as part of our starter template.

0 replies

jplhomer · 2021-09-09T14:16:56Z

jplhomer
Sep 9, 2021
Collaborator Author

Using `Cache` for both Full-Page and Subrequest Caching

After thinking about it, I think we should use the same Cache instance for both types of caching. It makes the entrypoint to Hydrogen way simpler, and it provides out of the box support for Oxygen + CF Workers.

Developers will not even see the implementation because it's abstracted by Hydrogen (as shown in the RFC above).

If you're curious, this is how it will look under the hood:

// Developer writes the following code in their Hydrogen app:

useQuery(MY_PROMISE_FUNCTION, { cache: { key: STRING_KEY, revalidateSeconds: TTL } })

// Under the hood, Hydrogen translates this to:

const value = await MY_PROMISE_FUNCTION();
const key = new Request(SHOP_DOMAIN + STRING_KEY);
const value = new Response(JSON.stringify(value), { headers: { 'cache-control': `max-age=${ttl}` }});

cache.put(request, response);

We don't even need to worry about caching POST requests since we're doing it this bespoke way 😄

But what about KV Store?

I think a globally-replicated, durable KV store would be a sweet tool to add to the Hydrogen developer arsenal! Cloudflare provides one, and Oxygen certainly could, too.

However, for v1 of Hydrogen, I don't think we need it. Maybe we find it necessary down the road, at which point we could add it to the Hydrogen entrypoints alongside Cache.

0 replies

igrigorik · 2021-09-16T22:03:23Z

igrigorik
Sep 16, 2021
Collaborator

Finally got a chance to read through the thread, doc and catch up on the review. 😅 Some meta takeaways and questions...

The nomenclature and capabilities strike me as confusing and inconsistent..

response.cacheForSeconds(N) for page cache:
- emits Cache-Control with max-age of N
- presumably, not subject to revalidation, or is it?
- if it's not set is there a default value or do we default to no-cache?
revalidateSeconds: n for subqueries:
- mimics Next's revalidate but the contract is unclear..
  - Are all subrequests implicitly long-TTL cached and revalidate sets a boundary for when to revalidate? That's Next's model. However, if we are not long-TTL caching then what revalidateTTL say that it should be cached for a minimum of N seconds and revalidated afterward?

FWIW, the confusion here—for me, at least—stems from the fact that revalidate and freshness timestamps are different things:

cacheFor speaks to freshness: this resource can be cached for up to X seconds.
revalidate is the revalidation window: after resource's freshness window expires, it's allowed to be served stale for up to Y seconds.

Could we converge on shared language and syntax across both page and subrequest? We should be explicit about the difference between freshess vs revalidation.

Can we revisit "don't cache in CF by default"

Unfortunately, Cloudflare does not cache HTML responses by default. Sure, we could flip a switch on Oxygen hosting to enable this, but this leaves developers self-hosting Hydrogen completely on the hook for implementing their own full-page cache and fronting their site with a CDN. Shipping Hydrogen without a mechanism for basic full-page caching seems like a bad developer experience.

FWIW, this feels backwards. If the framework emits Cache-Control and other appropriate cache headers, any upstream cache (Varnish, Squid, ATS, ..., and any CDN provider worth its salt) will be able to leverage it, and with better results than caching-at-origin. That aside, purely from a performance perspective, we do want to push towards being as close to the user as possible.

The challenge to above is setting correct headers. Cache-Control is great but it's a blunt tool and Vary can be a footgun for both CDNs and browsers (e.g. browser caches don't store multiple variants, only the latest one, but that's ~mostly OK for what we're discussing here). In particular, Vary signals that response should vary on the value of a particular header but doesn't say which parts of that value are salient — Vary: User-Agent is basically treated as don't cache. For CF, Cache-Tags solves this but it's (afaik?) a proprietary thing.

So, my question is, could we revisit the design here and explore how we can leverage standard Cache-Control, Vary, stale-while-revalidate, to get our responses closer to the user?

What's the invalidation story?

It's not covered in the RFC. If I've set page-level and subrequest-level caching policies...

Is there a mechanism for me to surgically invalidate certain keys or pages?
What happens when I push a new build to Oxygen?

My intuition is that we can probably punt on (1) to start, but we should be (2) invalidating caches clean whenever a new build is pushed. This means some automation on our side to wipe our own and upstream caches — e.g. send purge requests to CF.

16 replies

dylanahsmith Oct 19, 2021

Cache-Control is great but it's a blunt tool and Vary can be a footgun for both CDNs and browsers (e.g. browser caches don't store multiple variants, only the latest one, but that's ~mostly OK for what we're discussing here).

I agree that we shouldn't rely on the Vary header, due to it not being consistently supported across implementations. I remember it not being as efficient to implement in a distributed cache. You can consider Oxygen's support for it as mostly an attempt to avoid returning the wrong response if it does get used. The browser behaviour of only supporting the last variant sounds appealing as a way of supporting it without much complexity (i.e. not optimizing for its use).

dylanahsmith Oct 20, 2021

What's the invalidation story?

Is there a mechanism for me to surgically invalidate certain keys or pages?

The Cache API supports surgically invalidating a certain key from within the app.

Outside of the app, Oxygen should expose a way for a developer for flush the cache, in order to resolve issues with bad deployments that corrupt the cache. This could be implemented by storing a cache generation number for the app, which Oxygen would internally use in the cache key namespace, such that bumping this value will effectively flush the cache (without having to enumerate all the apps keys to delete them)

What happens when I push a new build to Oxygen?

We should invalidate backing caches (memcache/VECache) as well as Cloudflare's edge cache when deploying.

I disagree. Doing this in oxygen would prevent apps from preserving their caches across deploys.

Flushing the cache on deploys would introduce performance problems on deploys, making them more disruptive, even though most of the time the cache won't actually need flushing. This is more so a concern with higher-traffic apps and is especially a concern if a deploy is needed to fix a problem during a flash sale or update a site around one. Doing this would also make it harder to detect performance regressions on deploys. We don't want to discourage frequent deploys by making them more disruptive than they need to be.

One thing we could do from the Hydrogen perspective is inject a BUILD_HASH into the subrequest cache keys. This would ensure natural MISSes when new code is shipped.

Yeah, having hydrogen inject a build hash could work, but I think we could instead have Oxygen provide a deployment identifier in the environments object (e.g. that would also hold secrets outside the code) that can be used to namespace the cache keys. Letting the application namespace the cache keys (e.g. through hydrogen) to invalidate the cache on deploys has the advantage that it lets the app opt-out/opt-in to this behaviour.

Wouldn't really work for edge purging, though.

Or for the browser cache.

igrigorik Oct 25, 2021
Collaborator

Flushing the cache on deploys would introduce performance problems on deploys, making them more disruptive, even though most of the time the cache won't actually need flushing. This is more so a concern with higher-traffic apps and is especially a concern if a deploy is needed to fix a problem during a flash sale or update a site around one. Doing this would also make it harder to detect performance regressions on deploys. We don't want to discourage frequent deploys by making them more disruptive than they need to be.

This will lead to unexpected behavior for developers. When my new code is live, I expect to see fresh responses, not stale responses. I'm 100% with you on the performance implications of this, but the default should be to invalidate. We can't expect developers to write surgical invalidation scripts for every deploy — e.g. I've updated by PDP hence I now need to invalidate these 10K pages.

dylanahsmith Oct 25, 2021

My comment wasn't at all about what should be the default, which I thought I had held off on offering an opinion on. It was arguing that we shouldn't just make cache invalidation on deploy the oxygen behaviour.

This will lead to unexpected behavior for developers.

Note that cache invalidation on deploy could also be unexpected behaviour to developers, so I don't think there is actually a way to avoid unexpected behaviour. I think the difference is that cache invalidation on deploy would avoid the more disruptive problem of incorrect responses compared to the less disruptive performance issues on deploy.

We can't expect developers to write surgical invalidation scripts for every deploy

That isn't at all what I suggested. I was saying hydrogen could easily support this behaviour by namespacing the cache keys with a deployment identifier that could be provided by oxygen.

igrigorik Oct 26, 2021
Collaborator

That isn't at all what I suggested. I was saying hydrogen could easily support this behaviour by namespacing the cache keys with a deployment identifier that could be provided by oxygen.

Ah, I'm with you know. Yep, makes sense! @jplhomer another thing to bake in...

dylanahsmith · 2021-10-19T21:56:40Z

dylanahsmith
Oct 19, 2021

Remember that you can basically treat the Cache API as a simple key-value store. You can encode your key in the URL and store the value in the response body. It doesn't have to map to an actual request and response, but the API is designed to make that easier. If there is an associated request, you can always add to the cache key by adding URL query arguments to account for the lack of the request headers being used as part of the cache key. If a lot of data is relevant for the cache key, you can always store a hash of it in the cache key to avoid key length limits.

2 replies

ghost Oct 20, 2021

Though the danger here is that the user expectations of a KV store would be that values are globally distributed which our Cache neither is nor should be.

dylanahsmith Oct 20, 2021

My comment was related to the initial expectations of what the Cache API can provide in this discussion, which isn't really related to user expectations.

It seems like your are referencing Cloudflare's KV store, which is a totally separate thing. Unlike their Cache, they don't refer to it as an ephemeral store. It is eventually consistent with up to a 60 second propagation delay, so it is not for short lived cache data. It is much more comparable to an asset store (including the worker script), which would mostly be used during deployment. However, it is more flexible in that changes can be uploaded outside of deployments, including from cron jobs, allowing it to be used for something like page pre-rendering.

blittle · 2021-10-27T19:43:42Z

blittle
Oct 27, 2021
Maintainer

Reading this for the first time. A few thoughts:

I'm a little confused about the correlation between full page caching and subrequest caching. There's caching of responses from the server (HTML) and there is caching of data requests. With the example of full page caching, it is really talking about the HTML response. No matter what API requests went into generating that page, the whole page is cached. So you are caching the API data, and whatever work the server had to generate the HTML on the server. The subrequest caching seems only about API data? Would it be worthwhile to talk also about subrequest caching with regards to the HTML? I'm talking about Edge-Side-Includes, or something similar?
As others have already mentioned, the more low level caching mechanisms we give, it is easier to shoot yourself in the foot. It would be really cool if we had some validation built into Hydrogen to recognize erroneous caching permutations, and warn/error the user.

2 replies

jplhomer Oct 27, 2021
Collaborator Author

Yeah that's fair! I think it will be interesting to find a good balance between full-page caching and sub-request (API) caching. Whether a developer is rendering customized or personalized data on the server will certainly have an impact — it cannot be full-page-cached but the API requests can be cached. Likewise, a static marketing page can be cached heavily in both ways. I don't know what the answer or recommendation is yet here. I'm curious about your idea to cache HTML: do you envision this working in a similar sense of ESI with a string replacement/include? Or though a React API?
Yeah agreed! It would be super cool to warn when e.g. customized data is being included in the page and full-page cache has not been disabled or set to private: true. So much fun stuff we can do here.

blittle Oct 27, 2021
Maintainer

I'm honestly not sure. I need to learn more about React Server Components, the current architecture, and think through the problem more. I just want to believe that the rendering of the base page, and the sublayout, could both be cached.

RFC: Cache support in Hydrogen #98

jplhomer Aug 19, 2021 Collaborator

Update: 2021-10-26

Update: 2021-09-09

TL;DR

Cache primitives provided by Hosting platforms

Cache

CacheStore

Background

Full page cache

Examples

Subrequest caching

Important Notes

Why not...?

...just rely on network edge (Cloudflare) for full-page caching?

...use a single Cache instance for both Full-page caching and Subrequest caching?

... use ttl instead of revalidateSeconds for subrequest options?

... use urql or another existing GraphQL that has smart caching capabilities built-in?

Comparison: Next.js

Replies: 0 comments · 49 replies

jplhomer Aug 24, 2021 Collaborator Author

macournoyer Aug 25, 2021 Collaborator

jplhomer Aug 25, 2021 Collaborator Author

macournoyer Aug 25, 2021 Collaborator

jplhomer Sep 9, 2021 Collaborator Author

jplhomer Aug 25, 2021 Collaborator Author

jplhomer Aug 25, 2021 Collaborator Author

macournoyer Aug 25, 2021 Collaborator

macournoyer Aug 25, 2021 Collaborator

jplhomer Aug 25, 2021 Collaborator Author

macournoyer Aug 25, 2021 Collaborator

jplhomer Sep 9, 2021 Collaborator Author

lemonmade Aug 25, 2021 Collaborator

lemonmade Aug 25, 2021 Collaborator

jplhomer Sep 7, 2021 Collaborator Author

jplhomer Sep 9, 2021 Collaborator Author

jplhomer Sep 9, 2021 Collaborator Author

Full-page cache keys

jplhomer Sep 9, 2021 Collaborator Author

Using Cache for both Full-Page and Subrequest Caching

But what about KV Store?

igrigorik Sep 16, 2021 Collaborator

The nomenclature and capabilities strike me as confusing and inconsistent..

Can we revisit "don't cache in CF by default"

What's the invalidation story?

What's the invalidation story?

igrigorik Oct 25, 2021 Collaborator

igrigorik Oct 26, 2021 Collaborator

blittle Oct 27, 2021 Maintainer

jplhomer Oct 27, 2021 Collaborator Author

blittle Oct 27, 2021 Maintainer

jplhomer
Aug 19, 2021
Collaborator

`Cache`

`CacheStore`

...use a single `Cache` instance for both Full-page caching and Subrequest caching?

... use `ttl` instead of `revalidateSeconds` for subrequest options?

... use `urql` or another existing GraphQL that has smart caching capabilities built-in?

Replies: 0 comments 49 replies

jplhomer
Aug 24, 2021
Collaborator Author

macournoyer Aug 25, 2021
Collaborator

jplhomer Aug 25, 2021
Collaborator Author

macournoyer Aug 25, 2021
Collaborator

jplhomer Sep 9, 2021
Collaborator Author

jplhomer Aug 25, 2021
Collaborator Author

jplhomer Aug 25, 2021
Collaborator Author

macournoyer Aug 25, 2021
Collaborator

macournoyer
Aug 25, 2021
Collaborator

jplhomer Aug 25, 2021
Collaborator Author

macournoyer Aug 25, 2021
Collaborator

jplhomer Sep 9, 2021
Collaborator Author

lemonmade
Aug 25, 2021
Collaborator

lemonmade Aug 25, 2021
Collaborator

jplhomer Sep 7, 2021
Collaborator Author

jplhomer Sep 9, 2021
Collaborator Author

jplhomer
Sep 9, 2021
Collaborator Author

jplhomer
Sep 9, 2021
Collaborator Author

Using `Cache` for both Full-Page and Subrequest Caching

igrigorik
Sep 16, 2021
Collaborator

igrigorik Oct 25, 2021
Collaborator

igrigorik Oct 26, 2021
Collaborator

blittle
Oct 27, 2021
Maintainer

jplhomer Oct 27, 2021
Collaborator Author

blittle Oct 27, 2021
Maintainer