-
Notifications
You must be signed in to change notification settings - Fork 649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incremental etcd defrag #694
Comments
i am interested in implementing this cc @serathius @ahrtr |
@ahrtr can i take this ? |
/assign @Elbehery |
Would a draft pr be fine ? Or need a design beforehand ? |
Last Friday I had a bit of time to visualize the page usage in bbolt during an OpenShift installation, you can very well see compactions and the resulting fragmentation in it. https://www.youtube.com/watch?v=ZM91_mndjyY Leaving this here for the lack of a better ticket, maybe we can also put it up on the website to explain etcd's fragmentation. |
neat work @tjungblu 👍🏽 |
cc @ivanvc |
Resurrecting old @ptabor idea that we had discussed some time ago but didn't have time to implement.
Problem
Goal
Proposal:
During each transaction we make a decision whether we want do additional work. If storage overhead (file size / active page size) is more then 20% we do additional operations during a transaction.
The naive concept is that we look into the free-pages map and find the last allocated block. I think we currently maintain the lists of free blocks sorted from the beginning to the end for different sizes of the 'continuous' space. We might need to maintain a link to the last used page... Maybe we maintain a few links for not only 'the' last... but also to the last not too big page.
On each transaction we rewrite the last page to the first 'suitable' position from the free list... and move the reference to the last page down… The challenge is when the last page is to huge... and we don't have continuous space in the 'lower' part of the file to accept it... in such case there is heuristic needed to ignore it... and keep going up to the beginning of the file (hoping that it will generate more space for the 'bigger' chunk eventually)... And after some time (number of transactions) start the process from the end again.
The assumption (but it requires studies), is that even the biggest page is still relatively small in relationship to the whole file for k8s use-cases -> so it will work reasonably well.
Additional notes:
This is a note dump of my discussion with @ptabor. High level idea should stand, however I expect that there might be already existing continuous defrag algorithms that are better, then naive one presented here.
Expect that this can be implemented as a configurable optional behavior
The text was updated successfully, but these errors were encountered: