Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure Cache Control #44

Open
no-reply opened this issue Feb 5, 2017 · 5 comments
Open

Configure Cache Control #44

no-reply opened this issue Feb 5, 2017 · 5 comments
Assignees

Comments

@no-reply
Copy link
Member

no-reply commented Feb 5, 2017

From curationexperts/chf-sufia#27 (comment):

Approaches for cache control:

In any of these three cases, questions that need to be answered include:

  • What are the pre-warming needs?
    • Do search needs for an authority require a full pre-load of that authority?
    • Or can search be dependant on an external service?
  • What happens when the cache "invalidates"?
    • do we clear data from the backend,
    • or simply require re-fetch on the next retrieval if the remote is available?
  • What is an acceptable TTL?
  • Does the client ever need to manually invalidate the cache?

I think that's the general shape of the problem.

@no-reply
Copy link
Member Author

no-reply commented Feb 5, 2017

A first hack at answering some of these:

  • A static caching strategy is probably the best option (for me, given current project constraints).
    • The architecture should allow us to move in the direction of HTTP Cache-Control if more client control is desired.
    • I'm worried about a marmotta-specific approach and its implications for the currently swappable backends. While i'm a big 👍 on leaning on Marmotta for it's caching backend, I'd prefer to see it work within the constraints of an interface that we think we can support for the general case.
  • Cache invalidation should just trigger a refetch--on next retrieval or in a background job.
  • TTL is probably very high. Weeks?
  • I don't think the client needs to invalidate; at least not in a minimal implementation.

I don't have any answers about pre-warming. cc: @HackMasterA.

@hackartisan
Copy link

@no-reply your answers sound good to me.

  • +1 to backgrounding the refetch.
  • Client invalidation seems like it could be useful but not required upfront.
  • At CHF we have agreed we will rely on the external service for search; i.e. availability and retrieval time are less of a concern at cataloging than at display. I am a little nervous about the possibility of multi-day downtime, though, especially w/r/t LC. It would be good to spell out workarounds for getting cataloging done in this scenario: i.e., use a local authority as a temporary stop-gap until the service comes back up and the URIs can be filled in / cached?

@no-reply
Copy link
Member Author

@HackMasterA:

At CHF we have agreed we will rely on the external service for search; i.e. availability and retrieval time are less of a concern at cataloging than at display. I am a little nervous about the possibility of multi-day downtime, though, especially w/r/t LC.

👍 The way I'm thinking of this is that the qa-ldf bridge will support a default search drawing only from items already in the cache. For smaller vocabularies, we would have the option to handle search entirely internally via pre-warming. For larger datasets we would lean on the external service, but have the option to provide search over the cached items during downtime.

I think for most users with a mature repository, having the capacity to search the cache would be enough to keep cataloging moving. Other work arounds could be discussed, but I'm thinking they are beyond project scope.

@no-reply no-reply self-assigned this Feb 10, 2017
@hackartisan
Copy link

I think for most users with a mature repository, having the capacity to search the cache would be enough to keep cataloging moving. Other work arounds could be discussed, but I'm thinking they are beyond project scope.

I'm not sure about this assumption. For example, an archive moves from collection to collection over time and each collection will require a new set of vocabulary terms, even if they are within broadly related areas. Especially in the realm of personal names, but subjects as well.

But I guess if you assume that cataloging on an entirely new collection is relatively uncommon perhaps it holds up. maybe @catlu could weigh in.

@no-reply
Copy link
Member Author

no-reply commented Feb 19, 2017

But I guess if you assume that cataloging on an entirely new collection is relatively uncommon perhaps it holds up.

Yeah, this is basically my assumption. Or at least that this is true enough to be of value. Upstream outages may be frustrating, and prevent folks from working on their highest priorities, but at least it wouldn't necessarily halt work altogether.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants