Generation numbers should ensure inode uniqueness across remounts #19

albel727 · 2017-10-12T10:13:32Z

The title says it all. As per FUSE docs, (generation, inode) combination must not reoccur even after application restart, because there are things, like NFS, that rely on it being unique. Hence initializing generation numbers to zero like it happens now is bad. I suggest using something like a nanosecond-precision timestamp of current time at the moment of InodeTable creation, or a random value.

The text was updated successfully, but these errors were encountered:

albel727 · 2017-10-12T10:15:12Z

PS. Maybe allow user to set it themselves, but I think there should be a reasonable default behavior.

wfraser · 2017-10-17T21:45:13Z

Interesting, I didn't know that about the generation numbers! It's unfortunate that InodeTable's current behavior is quite poor, given this constraint -- it wants to reuse inode numbers whenever possible and bumps the generation number for the inode every time it does so. This is correct except that it starts back over at (1,0) on remount.

A better behavior would be to monotonically increase the inode number for the next file, and only bump the generation number when it rolls over. Then on remount, the filesystem can use the same generation number and start with the last-allocated inode from last time.

I'll look into making this change. Thanks for the report!

wfraser · 2017-10-17T22:17:24Z

(for reference to myself and others, the place in the documentation that actually says this is: https://github.com/libfuse/libfuse/blob/d92bf83c152ff88c2d92bd852752d4c326004400/include/fuse_lowlevel.h#L69-L81)

albel727 · 2017-10-19T06:06:52Z

This is correct except that it starts back over at (1,0) on remount.

Yeah, that's what I referred to, though reusing inodes is ok and happens with regular filesystems too.

But the slight problem is that generation space is limited. Despite what it says on the tin (u64 in fuse reply struct), generation number is actually 32 bit in linux kernel, and is returned so to userspace via FS_IOC_GETVERSION ioctl. I'm not sure how FUSE kernel driver resolves this contradiction internally, maybe it just truncates to the lower half of the 64 bit value or keeps a u64<->self-generated-u32 map.

A better behavior would be to monotonically increase the inode number for the next file, and only bump the generation number when it rolls over. Then on remount, the filesystem can use the same generation number and start with the last-allocated inode from last time.

Which would be like extending 64 bit inode number to linearly traversed 32+64 bit. Which surely means it won't overflow within our lifetime, but I'm not sure then how you're going to manage inode allocations without a HashMap, and where you're going to preserve the last (generation, inode) pair. This is why I suggested giving user a method to set initial generation and leave the preservation matter to him.

Maybe keep the current inode reuse behavior but provide them with just the maximum of all generations to preserve so you can remount starting with (max_generation+1, 0). Though the 32 bit limitation means max_generation could overflow in feasible time due to some quick inode reuse. But that would still be an improvement over the current generation behavior with minimal changes to inode code.

Though it would be nice to provide for operation without having to preserve anything as that demands some side storage. But current inode reuse together with 32 bit generation limitation means that user probably shouldn't just set generation to 32-bit seconds-since-epoch timestamp, as current inode reuse code might increment generation of some inode much faster than once per second (quite unlikely actually, as fuse tends to keep inodes in lookup state for quite long and forget very lazily, but I vaguely recall there being some fuse option to do eager forgets), and so overlap on remount will happen. User will be better off with a random number, though overlap is still possible then.

Maybe some hybrid approach will be better, with splitting free nodes into actual generations, allowing to reuse them only after a second passed. Then user can reasonably safely use seconds since epoch for generation initial value. In fact then it can be provided for them as default behavior.

albel727 · 2017-10-19T06:12:49Z

PS. Even just initializing generation to seconds since epoch and doing nothing else would be already better than the current state.

wfraser added the bug label Oct 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generation numbers should ensure inode uniqueness across remounts #19

Generation numbers should ensure inode uniqueness across remounts #19

albel727 commented Oct 12, 2017

albel727 commented Oct 12, 2017

wfraser commented Oct 17, 2017 •

edited

Loading

wfraser commented Oct 17, 2017 •

edited

Loading

albel727 commented Oct 19, 2017

albel727 commented Oct 19, 2017

Generation numbers should ensure inode uniqueness across remounts #19

Generation numbers should ensure inode uniqueness across remounts #19

Comments

albel727 commented Oct 12, 2017

albel727 commented Oct 12, 2017

wfraser commented Oct 17, 2017 • edited Loading

wfraser commented Oct 17, 2017 • edited Loading

albel727 commented Oct 19, 2017

albel727 commented Oct 19, 2017

wfraser commented Oct 17, 2017 •

edited

Loading

wfraser commented Oct 17, 2017 •

edited

Loading