-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(clustering): store entity only once in lmdb #12036
Conversation
### Summary Getting data from LMDB is slightly slower in some cases as it requires two LMDB reads instead of one, but the memory usage difference is quite big: ``` Single: LMDB shared memory size: 481M Master: LMDB shared memory size: 681M min mean 50 90 95 99 max Single: Latencies 37.959µs 370.850µs 140.239µs 416.418µs 709.358µs 3.749ms 76.638ms Master: Latencies 41.000µs 351.938µs 142.353µs 406.417µs 662.329µs 3.542ms 61.037ms Process Name PID Max Mem Min Mem Mean Mem Median Mem Standard Deviation Single: nginx: worker process 87638 467M 301M 438M 449M 31M Master: nginx: worker process 6762 439M 308M 405M 409M 29M Process Name PID Max Mem Min Mem Mean Mem Median Mem Standard Deviation Single: nginx: priv. agent 87639 776M 736M 757M 758M 7.7M Master: nginx: priv. agent 6763 1.6G 909M 1.4G 1.4G 148M Total Max Mem Min Mem Mean Mem Median Mem Single: 1.3G 1.1G 1.2G 1.2G Master: 2.1G 1.2G 1.8G 1.8G ``` Signed-off-by: Aapo Talvensaari <[email protected]>
If I combine this to the
465 MB DROP in LMDB shared memory size (compared to master) |
I read the code, I think that the core is But I can not understand why it can reduce the memory usage. |
Not only global, but also all by unique fields, like route name, consumer username/custom_id etc If we used serialized c-structs or something, we could gain even more. My hunch is that we should be able to get shared memory size below 100MB (with this dataset of 100.000 entities), perhaps below 50MB. With all optimizations that we have now we get from close to 700mb to near 200mb, but there is even more that we can do. Currently we serialize e.g. the field names too every time. Also keys we use in lmdb are quite long. |
Ah, I saw the only one writing action is But the format of cache key is still a little mysterious to me, especially |
In lua-resty-lmdb we introduce a new api |
Yes, this PR is not about to be merged, this is just a PoC and showcase that we can with simple changes get huge benefits on memory usage.
So I feel we need to have data model in lmdb that works greatly for incremental changes (the commit above does it, but it is ugly), AND we need a data model that is efficient (this PR here tries to make it a bit more efficient, but there is much more we can do). We can do this evolutionary too. |
@chronolaw, yes this is exactly what I am after in these conversations and discussions. Lets put our smart heads together and think. We already have evidence of what could work. But all the ideas are welcomed. |
Checked that feature. Key names can still be shortened if we want, while maintaining some kind of prefixes still. This is a great feature. Currently keys look like this:
They could be much shorter. |
Aapo, we should get this patch tested out by other teams (maybe a small run only to get an initial feel) like SDET, Qiqi. If this shows promise, we should certainly explore. |
Summary
Getting data from LMDB is slightly slower in some cases as it requires two LMDB reads instead of one, but the memory usage difference is quite big:
Checklist
changelog/unreleased/kong
orskip-changelog
label added on PR if changelog is unnecessary. README.md