Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find places where either we don't have shards available or where we have multiple shards #43693

Closed
45 tasks done
Tracked by #42920
sorbaugh opened this issue Feb 20, 2024 · 1 comment
Closed
45 tasks done
Tracked by #42920

Comments

@sorbaugh
Copy link
Contributor

sorbaugh commented Feb 20, 2024

Subtask to #42920

Updates

  • Awaiting for CI to detect if any other points are broken.
  • Sharding is done by Storage ID.

Commit to apply to block direct oc_filecache access: 97d089d

Example PR: #45139
In rough words, we cannot access the oc_filecache table anymore. So we have to use the new API which is sharding compatible and do join in PHP.

PR adding the API: #44458

Filecache queries that don't filter by a single storage id

ignoring any queries from tests and migrations, somewhat organized by how they will be an issue for sharding

Queries that join on the filecache by id and filter indirectly on the fileid

These probably have no way to get efficient sharding as the storage id can't be predicted, would need significant rework in how these relations are handled if they can be "fixed" at all.

  • Deck\Sharing\DeckShareProvider::getShareById: join with shares, selects by share id
  • Spreed\RoomShareProvider::getSharesByIds: join with shares
  • Spreed\RoomShareProvider::getSharedWith: join with shares
  • Deck\Sharing\DeckShareProvider::getSharedWith: join with shares
  • Deck\Sharing\DeckShareProvider::getSharedWithByType: join with shares
  • VirtualFolder\FolderConfigManager::getAllByRootIds: join with folders
  • Photos\AlbumMapper::getForAlbumIdAndUserWithFiles: joins with photos
  • Photos\AlbumMapper::getSharedAlbumsForCollaboratorWithFiles: joins with photos
  • Photos\AlbumMapper::getForAlbumIdAndFileId: joins with photos
  • Files\Config\UserMountCache::getMountsForRootId: join fileid on mount root id
  • UserInfo\UserInfoManager::getUsageInfo: joins with root id from mounts and selects by mount user
  • UserInfo\UserInfoManager::getUsedQuota: joins with root id from mounts and selects by mount point
  • Files\Config\UserMountCache::getMountsForUser: join fileid on mount root id
  • Share\Share::getItemSharedWithUser: join on shares, select by share recipient
  • Share\Share::getItems: join on shares, select by share data
  • Share\Share::getSharedWith: join on shares, select by share recipient

Queries that join on the filecache by id and filter indirectly on the storage

Should be possible to rework into a

  • Files\Config\UserMountCache::getUsedSpaceForUsers: select the home folder for all users
  • FilesExternal\Notify::getStorageIds: joins with storages from oc_mounts and filters on mount id, can probably be replaced

Queries that select from the filecache without filtering by storage, fileid or parent

These are mostly performance "insensitive" maintenance/background queries, depending if/how they join other tables, these could be ran across multiple/all shares, combining the results.

  • Sharing\DeleteOrphanedSharesJob: select over full table to find shares without matching filecache item
  • FilesAntivirus\BackgroundScanner::getUnscannedFiles: select items without matching av item
  • FilesAntivirus\BackgroundScanner::getToRescanFiles: select items with outdated av item
  • FilesAntivirus\BackgroundScanner::getOutdatedFiles: select items with outdated av item
  • Files\ScanFiles::getUsersToScan: select up to 1 item with size -1 from across all storages
  • Files\RepairTree::findBrokenTreeBits: searches through entire filecache
  • Files\Type\Loader::updateFileCache: select by filename LIKE and mimetype across the entire cache
  • Repair\RepairMimeTypes::updateMimetypes: select by filename LIKE and mimetype across the entire cache
  • Preview\ResetRenderedTexts::getPreviewsToDelete: filters only by path LIKE and mimetype
  • RepairShareOwnerShip::getWrongShareOwnership select over full table, join shares and files and mounts and find shares with wrong owner
  • RepairShareOwnerShip::getWrongShareOwnershipForUser same as above but limits by mount users
  • Files\DeleteOrphanedItems::cleanUp: select over full table to find system tags without matching filecache item
  • Files\DeleteOrphanedFiles::execute: select all items that don't have a storage in the storage table, join with storage table but search for NULL storage

Query that filters by parent, in most cases the storage id should be available to add to the filter

  • DAV\CustomPropertiesBackend::cacheDirectory
  • Deck\Sharing\DeckShareProvider::getSharesInFolder
  • GroupFolders\ACL\RuleManager::getRulesForFilesByParent
  • Comments\Manager::getNumberOfUnreadCommentsForFolder not sure if available, seems unused? but exported through OCP
  • Preview\BackgroundCleanupJon::getOldPreviewLocations

Queries that filter directly on multiple file ids

  • VirtualFolder\VirtualFolderFactory::getSourceFilesFromFileIds: by array of file ids

Queries that filter by single fileid

Would need an efficient way to lookup storage id by fileid, some already have a storage id around that can be reused

  • Files\RepairTree::deleteById: might be easy fix, storage id should be available
  • Files\ObjectUtil::objectExistsInDB: storage id is not available
  • Files\Config\UserMountCache::getCacheInfoFromFileId
  • FilesSharing\OrphanHelper::fileExists: don't think storage id is available
  • FilesSharing\SharedMount::getNumericStorageId: not available
  • FilesSharing\ShareBackend\Folder::getParentId: not available
  • Preview\ResetRenderedTexts::getPreviewsToDelete: available
@sorbaugh
Copy link
Contributor Author

sorbaugh commented Jul 3, 2024

Current approach hooks into the query builder to magically detect if a sharded table will be touched, which covers all these scenarios. Still "In Progress". Still open to confirm if CI will detect any points that might still be broken.

@skjnldsv skjnldsv modified the milestones: Nextcloud 30, Nextcloud 31 Aug 14, 2024
@sorbaugh sorbaugh modified the milestones: Nextcloud 31, Nextcloud 30 Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

4 participants