-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do the explicit readahead and don't pull in the entire overflow file into page cache #52
Comments
What about using mmap() instead? I think it should be much better for large files. |
The reason is that we already have good and battle tested streaming implementation of the overflow file, and changing that to the array like access has a relatively large costs. Other than that, according to Linus and backed by my experience (from the limited tests I did), if you're doing sequential access, you'll not be better off with the |
Though you may be right about the difference how the automatic prefetch is done, but then we'll still need to do a manual prefetch (and this time probably with |
Do you have a link to his statement? |
I saw it ages ago, let me dig it up. |
http://lkml.iu.edu/hypermail/linux/kernel/0004.0/0728.html Looking into the first upside, it's completely void for us, since: a) we're going through the same regions of the file at most few times; b) there's not much logic to avoid - we don't do any non-sequential access, just The other upside is what we need (the memory is not prefetch automatically and the pages are dropped as you don't need them) but "playing games with the virtual memory mapping is very expensive" (because of: |
and followup: http://lkml.iu.edu/hypermail/linux/kernel/0004.0/0775.html
|
Kernel will by default fill the unused RAM bringing the parts of the open files into the page cache. This is very problematic for the DMQ node, where the overflow files can grow over several dozens of gigabytes - since caching in the file will put enormous pressure on the system, and when it's not clear that file will be needed in the future (imagine that there's no readers, and the file just grows). Even if there are some readers, they are going to read the file in something close to sequential fashion (because multiple channels are multiplexed in the single file, this is a bit complicating this issue, but not too much), so there's no need to cache in the entire file.
Linux provides
posix_fadvise
, wit the three flags that are of the interest:We could leverage
POSIX_FADV_DONTNEED
, but the issue is either we mark entirefile as
DONTNEED
(which is then same asPOSIX_FADV_RANDOM
) or we wait for the usersto read the file so we can drop the first parts - something that it's redundant, since we're already
truncating the file from the beginning; and also this doesn't help us much when there are no readers, and we don't have much to drop.
What we could do is to disable entire automatic readahead with advising kernel with
POSIX_FADV_RANDOM
and then do thereadahead
manually (issue readahead in the windows of some preconfigured size - should be tweakable via a knob in the config file). Also make sure that kernel is dropping the new pages as soon as they are flushed out by the writeback daemon (perhapsPOSIX_FADV_RANDOM
will not help us there, since it should only affect prefetching, if so, look if you canPOSIX_FADV_DONTNEED
on the new pages).This way the reading of the file is still performed by the kernel - not blocking the application, but in controlled manner.
The text was updated successfully, but these errors were encountered: