-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache DBFile::exists() and reduce calls to file_exists() #388
base: 1
Are you sure you want to change the base?
Cache DBFile::exists() and reduce calls to file_exists() #388
Conversation
f4fe11d
to
24cc6d6
Compare
24cc6d6
to
2d74eee
Compare
810f4a6
to
dccbc90
Compare
dccbc90
to
2b88c7c
Compare
…ts(). Other small optimisation tweaks.
2b88c7c
to
e202f65
Compare
Is this only a problem when profiling or is this a real world problem? My expectation was that an existence check would be a relatively inexpensive operation ... which is why I do it everywhere. If this is only a problem when profiling then adding |
@maxime-rainville I do have an very slow local setup, so going from ~325 to ~125 milliseconds would be an uncommonly large gain on a proper server. However this is still large boost percentage wise, and I do think that a 50-100 millisecond improvement on a proper server is not out of the question. Is it a real world problem? Well the file manager is tolerable as is now, so it's not like we have to do iterative performance improvements. At the same time, I would say the file manager is pretty slow right now so it is something where we could make UX gains by simply making it faster The cache is effectively invalidated at the end of every request i.e. it's not-persisted anywhere other than in code The cache is also opt-in of sorts where you must specifically enable it, rather than it being on by default. See the related PR silverstripe/silverstripe-asset-admin#1078 for how this is done. I've also tried to make it very clear that this cache should only ever be used in read-only operations - hence the name of the class. |
So you're seeing this as something that is only enabled on requests where we are 100% sure that we're only reading files and never updating them? That makes some sense. I'll have a closer look. |
* Used to cache results for the duration of a request during read-only file operations | ||
* Do not use this during any create, update or delete operations | ||
*/ | ||
class ReadOnlyCacheService |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this should implement the Flushable
interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should officially support this API. Mark it as @internal
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not flushable, there's nothing to flush. The cache is only used for a single request and then it's gone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resettable
is the appropriate interface for classes which have in-memory caches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not super sure about this one. Feels a bit hackish.
If this had massive performance impact, I'd say it's worth doing, but from what I can tell, it's leading to a small performance gain and it introduces more complexity.
If we're looking at squeezing more performance out of assets, I think we'd be better off trying to wrap FlysystemAssetStore or the underlying Filesystem object with their own caching layer. It would be a little less straight forward, but it would lead to more comprehensive results.
@@ -321,7 +321,7 @@ private function applyToFileOnFilesystem(callable $callable, ParsedFileID $parse | |||
if ($parsedFileID->getHash()) { | |||
$mainFileID = $strategy->buildFileID($strategy->stripVariant($parsedFileID)); | |||
|
|||
if (!$fs->has($mainFileID)) { | |||
if ($mainFileID !== $fileID && !$fs->has($mainFileID)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That part is low-impact, I'm happy to merge it as an incremental improvement.
@@ -189,10 +190,20 @@ private function buildCacheKey($fileID, $fs) | |||
*/ | |||
private function getTimestamp($fileID, $fs) | |||
{ | |||
$timestamp = ''; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That part is low-impact, I'm happy to merge it as an incremental improvement.
What would implementation on this potentially look like? How would you wrap FlysystemAssetStore while still retaining backwards compatibility? |
A good example of that is the Maybe we update the asset store to do something similar for file existence:
Alternatively we could do something similar directly on PublicAssetAdapter/ProtectedAssetAdapter. This would be more complex, but would also have a bigger payoff because it would also be used by all the FileIDHelperParsers. Either way, I think it makes more sense to put that caching logic in the |
On a side note, Flysystem has some caching solution https://flysystem.thephpleague.com/v1/docs/advanced/caching/ Maybe we could look at using that instead. |
That Memory caching on Flysystem looks pretty tasty, and being non-persistant there's less ways for things to go wrong. Of course we'd need a pretty solid test plan to find edge cases. Presumably if anything happens to the physical file system that doesn't go via Flysystem, and therefore the cache won't get invalidated, then we're going to have problems |
If you decide to bypass our file abstraction and access the files directly, I think you assume some degree of risk. Still, in memory per-request caching sounds like it would be pretty safe and we could leave it on all the time. FYI If your file gets updated outside of the CMS, you'll get problems with the File Hash caching. That bit us in the ass with AWS EFS because a file system change in one server would be propagated to another server without the hashing cache being invalidated. |
Re @maxime-rainville's suggestion of the Flysystem Cached Adapter - I've been looking at implementing it but noted that this doesn't appear to be supported in Flysystem v2 so maybe it's worthwhile being built in to core? |
Performance improvements, primarily to the the file manager section in the CMS
Profiled using xhprof when viewing the gallery view in the file manager. There is an excessively large number of calls to file_exists() due to excessive rechecking during graphql requests
This pull request contains:
Note: the branch name includes
1.5
though it was later rebased off1
since this requires a minor release as it adds new APIRelated PRs
Related PR asset-admin: silverstripe/silverstripe-asset-admin#1078
Travis cwp-recipie-kitchen-sink integration testing
https://travis-ci.org/github/creative-commoners/cwp-recipe-kitchen-sink/builds/675232333
Profiling results
FileManager gallery view - 50 files - 49 draft and 1 published
The way the asset abstraction works, the public store gets checked first and if that's not found the protected store gets the gets checked. Because most of the files are in the protected store, this is pretty much a worst case scenario.
The same file is checked for existence many times repeatedly, it's not very optimised. Variants are such as thumbnails are checked for existence separately from the main file.
This was done on a vagrant setup so it'll be much slower than a proper server
Times are measure in xhprof using Incl. Wall Time (microsec) - which is the total time spent in the function and all child functions. 100,000 microsec = 100 milliseconds
DBFile::exists() - 306 Calls
baseline
331,094
322,158
413,078
pull request
146,256
104,318
103,692
Calls to League\Flysystem\Adapter\Local::has - this is the main function that calls file_exists()
baseline - 4,019 calls
66,568
69,775
76,692
pull request - 1,634 calls
35,296
32,674
35,566