Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generation numbers should ensure inode uniqueness across remounts #19

Open
albel727 opened this issue Oct 12, 2017 · 5 comments
Open

Generation numbers should ensure inode uniqueness across remounts #19

albel727 opened this issue Oct 12, 2017 · 5 comments
Labels

Comments

@albel727
Copy link

The title says it all. As per FUSE docs, (generation, inode) combination must not reoccur even after application restart, because there are things, like NFS, that rely on it being unique. Hence initializing generation numbers to zero like it happens now is bad. I suggest using something like a nanosecond-precision timestamp of current time at the moment of InodeTable creation, or a random value.

@albel727
Copy link
Author

PS. Maybe allow user to set it themselves, but I think there should be a reasonable default behavior.

@wfraser wfraser added the bug label Oct 17, 2017
@wfraser
Copy link
Owner

wfraser commented Oct 17, 2017

Interesting, I didn't know that about the generation numbers! It's unfortunate that InodeTable's current behavior is quite poor, given this constraint -- it wants to reuse inode numbers whenever possible and bumps the generation number for the inode every time it does so. This is correct except that it starts back over at (1,0) on remount.

A better behavior would be to monotonically increase the inode number for the next file, and only bump the generation number when it rolls over. Then on remount, the filesystem can use the same generation number and start with the last-allocated inode from last time.

I'll look into making this change. Thanks for the report!

@wfraser
Copy link
Owner

wfraser commented Oct 17, 2017

(for reference to myself and others, the place in the documentation that actually says this is: https://github.com/libfuse/libfuse/blob/d92bf83c152ff88c2d92bd852752d4c326004400/include/fuse_lowlevel.h#L69-L81)

@albel727
Copy link
Author

This is correct except that it starts back over at (1,0) on remount.

Yeah, that's what I referred to, though reusing inodes is ok and happens with regular filesystems too.

But the slight problem is that generation space is limited. Despite what it says on the tin (u64 in fuse reply struct), generation number is actually 32 bit in linux kernel, and is returned so to userspace via FS_IOC_GETVERSION ioctl. I'm not sure how FUSE kernel driver resolves this contradiction internally, maybe it just truncates to the lower half of the 64 bit value or keeps a u64<->self-generated-u32 map.

A better behavior would be to monotonically increase the inode number for the next file, and only bump the generation number when it rolls over. Then on remount, the filesystem can use the same generation number and start with the last-allocated inode from last time.

Which would be like extending 64 bit inode number to linearly traversed 32+64 bit. Which surely means it won't overflow within our lifetime, but I'm not sure then how you're going to manage inode allocations without a HashMap, and where you're going to preserve the last (generation, inode) pair. This is why I suggested giving user a method to set initial generation and leave the preservation matter to him.

Maybe keep the current inode reuse behavior but provide them with just the maximum of all generations to preserve so you can remount starting with (max_generation+1, 0). Though the 32 bit limitation means max_generation could overflow in feasible time due to some quick inode reuse. But that would still be an improvement over the current generation behavior with minimal changes to inode code.

Though it would be nice to provide for operation without having to preserve anything as that demands some side storage. But current inode reuse together with 32 bit generation limitation means that user probably shouldn't just set generation to 32-bit seconds-since-epoch timestamp, as current inode reuse code might increment generation of some inode much faster than once per second (quite unlikely actually, as fuse tends to keep inodes in lookup state for quite long and forget very lazily, but I vaguely recall there being some fuse option to do eager forgets), and so overlap on remount will happen. User will be better off with a random number, though overlap is still possible then.

Maybe some hybrid approach will be better, with splitting free nodes into actual generations, allowing to reuse them only after a second passed. Then user can reasonably safely use seconds since epoch for generation initial value. In fact then it can be provided for them as default behavior.

@albel727
Copy link
Author

PS. Even just initializing generation to seconds since epoch and doing nothing else would be already better than the current state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants