Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regarding sync(2), syncfs(2) fsync(2),fdatasync(2) #136

Open
fvalasiad opened this issue Dec 7, 2022 · 0 comments
Open

Regarding sync(2), syncfs(2) fsync(2),fdatasync(2) #136

fvalasiad opened this issue Dec 7, 2022 · 0 comments

Comments

@fvalasiad
Copy link
Collaborator

fvalasiad commented Dec 7, 2022

In my understanding:

Facts

  • sync(2)
    POSIX. Unlike POSIX though, Linux waits for I/O completion before returning.

sync() causes all pending modifications to filesystem metadata and
cached file data to be written to the underlying filesystems.

  • syncfs(2)
    POSIX fsync(2). But Linux specific has the same guarantee sync(2) does on Linux.

syncfs() is like sync(), but synchronizes just the filesystem
containing file referred to by the open file descriptor fd.

  • fsync(2)
    POSIX . Again, unlike POSIX though, on Linux the system call won't return unless the sync actually happens.

  • fdatasync(2)
    fsync(2) with lazy evaluation built in.

Proposal

Obviously what was mentioned above isn't the full picture, since for example it implies that syncfs(2) and fsync(2) are equivalent, which is far from true. But for our purposes, they are to be treated mostly the same.
So given the above statements are correct, my proposal is:

  • sync(2)
    We rehash all open files with pending write operations and store them as new files almost as if they were close(2)ed and re-open(2)ed again.

  • syncfs(2)
    We only rehash the open file identified by fd. Basically what i described for sync(2) but for a singular file.

  • fsync(2)
    Same as for syncfs(2)

  • fdatasync(2)
    This one is quite tricky since it doesn't flush the data unless they are to be read. This is a problem to us since we cannot possibly create the new file with the new hash upon encounter of this system call. Our hash checker that I proposed to be remove due to its very poor performance actually addresses this though. What's the chance a compiler uses any of this?

It's also worth mentioning in general that the hash checker addresses the problems this entire issue tries to solve by tracking the sync family of system calls. Since if one was to sync a file in any way described above, the hash upon the next read would be updated and a new file would be created as a result.

Oh the price we pay for performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant