Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_timestamp should use the cache, not query the database #8

Open
WD-stefaang opened this issue Nov 8, 2018 · 9 comments
Open

get_timestamp should use the cache, not query the database #8

WD-stefaang opened this issue Nov 8, 2018 · 9 comments

Comments

@WD-stefaang
Copy link
Contributor

Opening and reloading a stage applies these steps with database calls:

  • resolve asset: check asset existence in db
  • fetch asset: download asset from db if NEEDS_CACHING
  • get timestamp: check asset existence again in db

Fetch asset always does an extra get_raw_timestamp check when the cache says that fetching is not necessary...
Because of this, I think that the get_timestamp call should just return the cached timestamp value without making yet another database query.

I think this line should be dropped: https://github.com/LumaPictures/usd-uri-resolver/blob/master/URIResolver/sql.cpp#L525

This reduces the database queries per unique asset from 3 to 2.

@sirpalee
Copy link
Contributor

Hey! Querying the timestamp is meant to support updating the asset in case the time stamp has been updated. Using the timestamp from the cache does not give us this feature.

However, we can add an option that disables updating the timestamp from the database and always uses the cache.

@sirpalee
Copy link
Contributor

@elrond79 @nrusch what do you think?

@WD-stefaang
Copy link
Contributor Author

I think that get_timestamp is (ab)used to check if a reload of the assets will be required on the next reload... while most of the code in SDFLayer seems to use get_timestamp to keep track of the timestamp of the current layer.
The SDFLayer._reload call seems to prefer to make use of UpdateAssetInfo.

@WD-stefaang
Copy link
Contributor Author

The timestamp recorded during the resolver.FetchToLocalResolvedPath is the correct asset timestamp to be reported in layer::_OpenLayerAndUnlockRegistry.
But for layer::_Reload you'd want the latest timestamp from the original asset, not from the localPath.

So if you won't need to reload assets in the usd runtime (usdcat, usdzip, ...), you can save a lot of database calls.

@WD-stefaang
Copy link
Contributor Author

The confusion all starts with this dubious function

AssetResolver::GetModificationTimestamp(
    const std::string& path, 
    const std::string& resolvedPath)

It would be ideal if an extra 'fetch' boolean would be passed along to indicate that you want either the timestamp from the path (from database) or from the resolvedPath (local file).
It requires some changes in sdf stage... but then both the initial stage load and reloads would run optimally without extra configuration options such as 'NO_CACHED_TIMESTAMP'.
Something for AR2.0 @spiffmon?

@WD-stefaang
Copy link
Contributor Author

I'm getting my expected result by using UpdateAssetInfo to fetch the latest date modified from the db when the cache indicates that the asset is already fetched. When the cache indicates 'NEEDS_CACHING', UpdateAssetInfo does nothing.

Stage.Open
- resolve: set entry in cache with NEEDS_CACHING
- update asset info: do nothing as asset is not cached yet
- fetch: fetch asset and save timestamp in cache
- timestamp: get timestamp from cache

Stage.Reload (no changes)
- resolve: uses cache
- update asset info: update info in cache
- timestamp: get timestamp from cache

Stage.Reload (with changes)
- resolve: uses cache
- update asset info: update info in cache, set NEEDS_CACHING
- timestamp: get timestamp from cache, is newer than layer.info
- fetch: fetch and save timestamp in cache

I ended up with an implementation where Resolve only determines the localPath (without doing any calls), UpdateAssetInfo does a HEAD request (similar to get_raw_timestamp) only if a local cache is present (to refresh the metadata) and Fetch does the heavy lifting with less pre-conditions.

On top of that, I'm adding the "IfModifiedSince" header in the Fetch call, catching the 304 Not Modified error.
In MySQL, a similar feature could be added to the query, allowing the server to decide that no data needs to be send.

So ideally, the URI resolver does a single DB call max for each time the scene is loaded. Only when the server doesn't support the 'IfModifiedSince' header (even not via a query), you may need up to 2 DB calls: one to resolve/update and one to fetch.
An option should exist to prevent checking/fetching if a local cached copy exists.

@sirpalee
Copy link
Contributor

Thanks for the detailed information, I'll dig through the details. If you have it working on your side, a PR would be useful.

@WD-stefaang
Copy link
Contributor Author

I've been working in a separate repo: https://github.com/westerndigitalcorporation/usd-s3-resolver
Primary focus was a performant S3 resolver, I bothered a bit less about 'plugability' with the URI resolver at this stage... I kept the history of this project though, so it shouldn't be too hard to merge them.

I need to add a few more performance tests and sort out the fact that reloads are slower than initial loads on S3. Once that is done, I could apply the same logic to the MySQL resolver. It looks like stage.reload is single threaded while the initial load is multithreaded... more on that later.

Unfortunately I have to move on to my next project so I can't make promises on the timeline here.

@WD-stefaang
Copy link
Contributor Author

resolver.RefreshContext(context) seems the ideal place to reset the caches. You're even able to flush only a subset of the cache.
You may need to expose the python bindings here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants