Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Incremental Cache improvements #621

Open
khuezy opened this issue Nov 6, 2024 · 8 comments
Open

[RFC] Incremental Cache improvements #621

khuezy opened this issue Nov 6, 2024 · 8 comments

Comments

@khuezy
Copy link
Contributor

khuezy commented Nov 6, 2024

https://github.com/opennextjs/opennextjs-aws/blob/main/packages/open-next/src/overrides/incrementalCache/s3.ts#L43-L58

I had a conversation with the Vercel devs and they said that their cacheHandle makes use of lru-cache to prevent extraneous calls to S3 (or any external datastore). When I was using sst NextjsSite, I noticed lots of S3 hits too:

async get(key, isFetch) {
    const result = await s3Client.send(
      new GetObjectCommand({
        Bucket: CACHE_BUCKET_NAME,
        Key: buildS3Key(key, isFetch ? "fetch" : "cache"),
      }),
    );

    const cacheData = JSON.parse(
      (await result.Body?.transformToString()) ?? "{}",
    );
    return {
      value: cacheData,
      lastModified: result.LastModified?.getTime(),
    };
  },

Whenever a page is reloaded, it would make the S3 call due to the get. Is this correct behavior? Or should we place lru-cache here before hitting S3? eg if cache is null or is stale.

CC: @conico974 @vicb

@conico974
Copy link
Contributor

From what i've understood of what they're saying and also observed it myself the way it works in our case is exactly how we want it to work.

Except for PPR cases (which are special), this will get called in 2 situations

  • Either for an ISR/SSG route (only if not cached by the CDN)
  • Or for the fetch/unstable cache if used (can be the case in SSR if configured to do so)

It can become costly when you use the fetch cache a lot (especially in SSR) or very low revalidation time with a lot of pages, but that's kind of expected.

As far as i know they use an LRU cache only in standalone and in single instance.

The problem with an In-Memory LRU cache is with on demand revalidation. There is simply no way to tell all the alive lambda/function about that without some custom runtime and a continously running lambda, and even then, is it worth it ?
The only way to improve this would be to reduce the correctness.

There is an alternative that would be to create a multi tiered cache system, it should probably be in 3-tier :

  • A very fast in-memory cache that become stale after a very short amount of time (by default 0s since it could break stuff)
  • Something in DDB or redis or similar where we could easily retrieve metadata (like lastUpdatedTime), this would be faster and cheaper than S3 and we could probably even store some small data (small fetch for example). In case data is not stale in DDB then we would return either by order of pref : In-Memory, DDB, S3.
  • And the S3 or equivalent where we would finally retrieve data.

But this is a lot more work, a lot more potential breaking point (what happens if something goes wrong while you've inserted in DDB, but not yet in S3), and this is only really useful if using a lot of fetch cache or very low revalidation time and a lot of POP in the CDN requesting at about the same time. We also have to know if the entry is stale, which i think we can't know on earlier version of next.

And to add to that, this is probably totally useless to do that in cloudflare if they use something like KV instead of R2/S3

The way i see it, we could create a new IncrementalCache override that we could call something like MultiTieredDDBS3 and that people could use instead of the default s3 one or the s3-lite.

@khuezy
Copy link
Contributor Author

khuezy commented Nov 6, 2024

I'm going to experiment w/ this on Fly machines. Even when I build static pages, doing a reload on static pages still triggers a cache GET, thus hitting S3 each time.
I've disabled dynamicIO/ppr and export const dynamic = 'force-static', the headers show that the X-Nextjs-Cache: 1.

@conico974
Copy link
Contributor

Is it a miss on the CDN as well ? If that's the case, yeah that's expected as well.
This fall under this

Either for an ISR/SSG route (only if not cached by the CDN)

And with App router you'll get a ton of CDN miss because you have 1 different CDN entry for every different RSC link to a page ( and the html of course ).
This could be partially fixed by this kind of hack : sst/sst#3805, but then parallel route won't work

@khuezy
Copy link
Contributor Author

khuezy commented Nov 6, 2024

No this is just testing local via npm run start. There is the proper s-maxage header so that would be properly cached on the CDN. I'll update my findings once I deploy a real app.

@conico974
Copy link
Contributor

Yeah i expected that as well, i wonder if this cannot even happen on dev mode with the cacheHandlerPath in next-config.

This prop should probably be avoided without a CDN anyway

@khuezy
Copy link
Contributor Author

khuezy commented Nov 7, 2024

I'm starting to get the hang of the cacheHandler. According to the types, get returns Promise<CacheHandlerValue | null>. If I'm not mistaken, our implementation isn't correct. It should either return data or null, not crash, eg:

     // I still think we should use a local cache here. We invalidate it when the age expires.
     try {
      const result = await s3Client.send(

        new GetObjectCommand({
          Bucket: CACHE_BUCKET_NAME,
          Key: buildS3Key(key, isFetch ? "fetch" : "cache"),
        }),
      );
  
      const cacheData = JSON.parse(
        (await result.Body?.transformToString()) ?? "{}",
      );
      return {
        value: cacheData,
        lastModified: result.LastModified?.getTime(),
      };
     } catch (err) {
       // it really doesn't matter what the error is, return null so that next server can generate the page and call `set()`
        return null
     }

@conico974
Copy link
Contributor

It should either return data or null, not crash

It is doing exactly that, both getFetchCache and getIncrementalCache are inside a try/catch and returns null if an error occurs. The rest of the code should be error free and if something crash before these 2 functions something is really, really wrong

public async get(
key: string,
// fetchCache is for next 13.5 and above, kindHint is for next 14 and above and boolean is for earlier versions
options?:
| boolean
| {
fetchCache?: boolean;
kindHint?: "app" | "pages" | "fetch";
tags?: string[];
softTags?: string[];
kind?: "FETCH";
},
) {
if (globalThis.disableIncrementalCache) {
return null;
}
const softTags = typeof options === "object" ? options.softTags : [];
const tags = typeof options === "object" ? options.tags : [];
return isFetchCache(options)
? this.getFetchCache(key, softTags, tags)
: this.getIncrementalCache(key);
}

I still think we should use a local cache here. We invalidate it when the age expires.

This will cause trouble with page router On Demand Revalidation, it's fine without it. Maybe an option for people not using On Demand revalidation, but not something we'd want enabled by default.
And for app router we'd still need to reach the tag cache before to know if revalidateTag or revalidatePath has been called

@khuezy
Copy link
Contributor Author

khuezy commented Nov 7, 2024

Ah ok I was looking at the wrong file.

I forgot OpenNext has to support a bunch of old versions so it's doing a lot of work 😬

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants