Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug when renaming multiple objects #1607

Draft
wants to merge 13 commits into
base: api-breakers
Choose a base branch
from
Draft

Conversation

peterjan
Copy link
Member

@peterjan peterjan commented Oct 11, 2024

This was quite the rabbit hole, I'll try and concisely describe:

  • how we use directories
  • what happens when you add an object
  • what happens when you rename an object
  • what happens when you rename multiple objects, and how it went wrong
  • what I had to do to fix it

Aside from the above I noticed MakeDirsForPath doesn't work properly handle multi-byte UTF-8 characters because we use utf8.RuneCountInString and then access the bytes of the string at certain position, not the runes.

This PR also adds a migration which queries the objects that were corrupted by a rename and recreates their directory structure and points them to the proper parent. Has to be tested still on arequipa or another node that has corrupted directory structure.

Fixes #1599


  1. Directories
    In renterd every object is linked to a directory. Important to note here is that directories are not scoped to a bucket. Objects from different buckets can point to the same directory if the path from the root is identical. They are merely used to store structure and to speed up listing queries.

  2. Adding an object
    When you add an object we pre-create the directories using MakeDirsForPath. What's important about this method is that it returns the ID of the parent directory, which we'll use when we insert the object. Another important thing to mention is that if you supply it a path that ends with a / it will return the ID of the parent folder.

  3. Renaming objects
    When we rename an object we pre-create the directories again using MakeDirsForPath. We then update existing objects and update the key of an object to whatever renamed value. Important here is that the target object can already exist, in which case a rename fails unless you rename by force.

  4. Renaming multiple objects
    Renaming multiple objects is used in the UI to rename folder names. That makes sense because you want all contents of that folder to still be in the same place after renaming the folder. This feature reuses MakeDirsForPath, and this was the problem because we use whatever parent directory ID to set as the directory ID of the objects we are renaming.

Imagine having a file (and directory structure) like /firefly/s1/ep1 and I want to rename s1 to s01. I would make directories for this new path and MakeDirsForPath would return me the ID of the firefly directory. It would then go and update the episode 1 and point it to the firefly directory -> wreaking total havoc

  1. To fix it I had to change two things:
  • I added InsertDirectoriesForRename which return a mapping of old directory IDs to renamed directory IDs
  • I added a helper method object.Directories which return directories for a path, it takes a special boolean that returns the path itself should the path be a directory (this is important when we rename multiple objects at a time)

This very quickly becomes quite complicated because the UI also creates dummy objects for directories. Typing it all out here is quite lengthy so for now I'll refer to TestRenameObjectsRegression where I showcase some tricky renames in a regression test.

@peterjan peterjan self-assigned this Oct 11, 2024
@peterjan peterjan changed the title Fix multi object renames Fix bug when renaming multiple objects Oct 15, 2024
@peterjan
Copy link
Member Author

@n8maninger I assigned you as a reviewer Nate, in this particular case though I really need Chris to take a look at it as well. It's only couple of days and I'm sure he'll want to have a pass on it before we consider hot-fixing it, especially since he recently refactored the whole thing, his understanding of the various pitfalls and caveats should be quite fresh in his memory.

// Directories returns the directories for the given path. When explicit is
// true, the returned directories do not include the path itself should it be a
// directory. The root path ('/') is always excluded.
func Directories(path string, explicit bool) (dirs []string) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChrisSchinnerl I'm not a big fan of the naming here, I went with explicit because we have this test called TestObjectsExplicitDir re-reading it now makes me think I have to inverse it actually. Holding off on that for now until we review the entire thing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbh I'm not a fan of having the bool. It feels a bit too forced. The way I understand it is that it's only required for InsertDirectoriesForRename. Looking at that method it seems very verbose. I'm pretty sure your first instinct was to just update the directory names instead of doing the whole mapping thing but you ran into issues due to the fact that the directories table doesn't contain the bucket.

I think it might be worth adding the bucket id to the directories table and extending the unique key to cover that.
Then simplify InsertDirectoriesForRename and finally get rid of the explicit bool since we should be able to just update all directories of the bucket that we rename within. My guess is that that's also a lot faster than looping and inserting directories.

That makes the migration more complicated but if we ever decide to use the directories table as a cache for fields like size and health of contained objects, we will need to introduce buckets to that table anyway.

"/firefly/trailer",
"/firefly/s1/ep1",
"/firefly/s1/ep2",
"/firefly/s2/ep1",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming feels a bit too shady ^^

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes debugging so much easier but ok 😁 I was on the fence whether to rename it last minute myself.

// Directories returns the directories for the given path. When explicit is
// true, the returned directories do not include the path itself should it be a
// directory. The root path ('/') is always excluded.
func Directories(path string, explicit bool) (dirs []string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbh I'm not a fan of having the bool. It feels a bit too forced. The way I understand it is that it's only required for InsertDirectoriesForRename. Looking at that method it seems very verbose. I'm pretty sure your first instinct was to just update the directory names instead of doing the whole mapping thing but you ran into issues due to the fact that the directories table doesn't contain the bucket.

I think it might be worth adding the bucket id to the directories table and extending the unique key to cover that.
Then simplify InsertDirectoriesForRename and finally get rid of the explicit bool since we should be able to just update all directories of the bucket that we rename within. My guess is that that's also a lot faster than looping and inserting directories.

That makes the migration more complicated but if we ever decide to use the directories table as a cache for fields like size and health of contained objects, we will need to introduce buckets to that table anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

2 participants