perf(clustering): store entity only once in lmdb #12036

bungle · 2023-11-15T14:07:52Z

Summary

Getting data from LMDB is slightly slower in some cases as it requires two LMDB reads instead of one, but the memory usage difference is quite big:

Single: LMDB shared memory size: 481M
Master: LMDB shared memory size: 681M

Process Name                   PID   Max Mem  Min Mem  Mean Mem  Median Mem  Standard Deviation
Single: nginx: worker process  87638    467M     301M      438M        449M                 31M
Master: nginx: worker process   6762    439M     308M      405M        409M                 29M

Process Name                   PID    Max Mem  Min Mem  Mean Mem  Median Mem  Standard Deviation
Single: nginx: priv. agent     87639     776M     736M      757M        758M                7.7M
Master: nginx: priv. agent      6763     1.6G     909M      1.4G        1.4G                148M

Total   Max Mem  Min Mem  Mean Mem  Median Mem
Single:    1.3G     1.1G      1.2G        1.2G
Master:    2.1G     1.2G      1.8G        1.8G

                       min       mean         50         90         95       99       max
Single: Latencies 37.959µs  370.850µs  140.239µs  416.418µs  709.358µs  3.749ms  76.638ms
Master: Latencies 41.000µs  351.938µs  142.353µs  406.417µs  662.329µs  3.542ms  61.037ms

Checklist

The Pull Request has tests
A changelog file has been created under changelog/unreleased/kong or skip-changelog label added on PR if changelog is unnecessary. README.md
There is a user-facing docs PR against https://github.com/Kong/docs.konghq.com - PUT DOCS PR HERE

### Summary Getting data from LMDB is slightly slower in some cases as it requires two LMDB reads instead of one, but the memory usage difference is quite big: ``` Single: LMDB shared memory size: 481M Master: LMDB shared memory size: 681M min mean 50 90 95 99 max Single: Latencies 37.959µs 370.850µs 140.239µs 416.418µs 709.358µs 3.749ms 76.638ms Master: Latencies 41.000µs 351.938µs 142.353µs 406.417µs 662.329µs 3.542ms 61.037ms Process Name PID Max Mem Min Mem Mean Mem Median Mem Standard Deviation Single: nginx: worker process 87638 467M 301M 438M 449M 31M Master: nginx: worker process 6762 439M 308M 405M 409M 29M Process Name PID Max Mem Min Mem Mean Mem Median Mem Standard Deviation Single: nginx: priv. agent 87639 776M 736M 757M 758M 7.7M Master: nginx: priv. agent 6763 1.6G 909M 1.4G 1.4G 148M Total Max Mem Min Mem Mean Mem Median Mem Single: 1.3G 1.1G 1.2G 1.2G Master: 2.1G 1.2G 1.8G 1.8G ``` Signed-off-by: Aapo Talvensaari <[email protected]>

bungle · 2023-11-15T14:31:27Z

If I combine this to the streaming branch (that also removes LMDB drop, the streaming protocol is not too important here and should be avoided) (#11632) here are the results:

LMDB shared memory size: 216M

--------------------------------------------------------------------------------------------------
- Memory Usage (RSS) Report for Data Plane Worker Process:                                       -
--------------------------------------------------------------------------------------------------
Process Name                     PID    Max Mem  Min Mem  Mean Mem  Median Mem  Standard Deviation
nginx: worker process            46694     379M     313M      351M        354M                 18M
nginx: privileged agent process  46695     252M     245M      247M        246M                1.3M
                                           631M     558M      598M        600M

465 MB DROP in LMDB shared memory size (compared to master)
800 MB DROP in Privileged agent median memory usage (compared to master)
1469 MB DROP in total max memory usage (compared to master)

chronolaw · 2023-11-16T06:44:48Z

I read the code, I think that the core is global_query_cache_key (id with *), if a key contains *, then we will query twice to get the real value.

But I can not understand why it can reduce the memory usage.

bungle · 2023-11-16T08:32:05Z

I read the code, I think that the core is global_query_cache_key (id with *), if a key contains *, then we will query twice to get the real value.

But I can not understand why it can reduce the memory usage.

Not only global, but also all by unique fields, like route name, consumer username/custom_id etc

If we used serialized c-structs or something, we could gain even more. My hunch is that we should be able to get shared memory size below 100MB (with this dataset of 100.000 entities), perhaps below 50MB. With all optimizations that we have now we get from close to 700mb to near 200mb, but there is even more that we can do. Currently we serialize e.g. the field names too every time. Also keys we use in lmdb are quite long.

chronolaw · 2023-11-16T08:46:39Z

Ah, I saw the only one writing action is t:set(item_cache_key, item_marshalled), the others write the cache key.

But the format of cache key is still a little mysterious to me, especially ws_id == "*".

chronolaw · 2023-11-16T08:49:08Z

In lua-resty-lmdb we introduce a new api prefix() (Kong/lua-resty-lmdb#48), so the key may not be shorten, we will use prefix to query keys.

bungle · 2023-11-16T08:59:54Z

But the format of cache key is still a little mysterious to me, especially ws_id == "*".

Yes, this PR is not about to be merged, this is just a PoC and showcase that we can with simple changes get huge benefits on memory usage.

we need to get rid of lmdb drop (see: c90a5aa)
we need better model for data in lmdb (this PR shows a simple change that already gives huge benefits)

So I feel we need to have data model in lmdb that works greatly for incremental changes (the commit above does it, but it is ugly), AND we need a data model that is efficient (this PR here tries to make it a bit more efficient, but there is much more we can do). We can do this evolutionary too.

bungle · 2023-11-16T09:03:07Z

In lua-resty-lmdb we introduce a new api prefix() (Kong/lua-resty-lmdb#48), so the key may not be shorten, we will use prefix to query keys.

@chronolaw, yes this is exactly what I am after in these conversations and discussions. Lets put our smart heads together and think. We already have evidence of what could work. But all the ideas are welcomed.

bungle · 2023-11-16T16:42:31Z

In lua-resty-lmdb we introduce a new api prefix() (Kong/lua-resty-lmdb#48), so the key may not be shorten, we will use prefix to query keys.

Checked that feature. Key names can still be shortened if we want, while maintaining some kind of prefixes still. This is a great feature.

Currently keys look like this:

parameters:cluster_id:::::*

routes:134346eb-e158-4c39-9d98-823eed0a82e4:::::f082d5d1-2c12-474e-8138-bf966644a3d6
routes:134346eb-e158-4c39-9d98-823eed0a82e4:::::*
routes|f082d5d1-2c12-474e-8138-bf966644a3d6|name:test
routes|f082d5d1-2c12-474e-8138-bf966644a3d6|@list
routes|*|@list

plugins:0b7bafca-2bcd-4071-b48e-256e210fe56c:::::f082d5d1-2c12-474e-8138-bf966644a3d6
plugins:0b7bafca-2bcd-4071-b48e-256e210fe56c:::::*
plugins:request-termination:::::f082d5d1-2c12-474e-8138-bf966644a3d6
plugins:0a7bafca-2bcd-4071-b48e-256e210fe56c:::::f082d5d1-2c12-474e-8138-bf966644a3d6
plugins:0a7bafca-2bcd-4071-b48e-256e210fe56c:::::*
plugins:request-termination:134346eb-e158-4c39-9d98-823eed0a82e4::::f082d5d1-2c12-474e-8138-bf966644a3d6
plugins|f082d5d1-2c12-474e-8138-bf966644a3d6|@list
plugins|f082d5d1-2c12-474e-8138-bf966644a3d6|routes|134346eb-e158-4c39-9d98-823eed0a82e4|@list
plugins|*|@list
plugins|*|routes|134346eb-e158-4c39-9d98-823eed0a82e4|@list

workspaces:f082d5d1-2c12-474e-8138-bf966644a3d6:::::
workspaces:f082d5d1-2c12-474e-8138-bf966644a3d6:::::*
workspaces:default:::::
workspaces||name:default
workspaces||@list

tags||@list
declarative_config:hash

They could be much shorter.

hbagdi · 2023-11-16T16:58:59Z

Aapo, we should get this patch tested out by other teams (maybe a small run only to get an initial feel) like SDET, Qiqi. If this shows promise, we should certainly explore.

bungle added the pr/discussion This PR is being debated. Probably just a few details. label Nov 15, 2023

pull-request-size bot added the size/M label Nov 15, 2023

github-actions bot added the core/db label Nov 15, 2023

github-actions bot assigned bungle Nov 15, 2023

bungle added the pr/do not merge label Nov 16, 2023

bungle closed this Dec 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(clustering): store entity only once in lmdb #12036

perf(clustering): store entity only once in lmdb #12036

bungle commented Nov 15, 2023 •

edited

Loading

bungle commented Nov 15, 2023 •

edited

Loading

chronolaw commented Nov 16, 2023

bungle commented Nov 16, 2023 •

edited

Loading

chronolaw commented Nov 16, 2023

chronolaw commented Nov 16, 2023

bungle commented Nov 16, 2023 •

edited

Loading

bungle commented Nov 16, 2023 •

edited

Loading

bungle commented Nov 16, 2023 •

edited

Loading

hbagdi commented Nov 16, 2023

perf(clustering): store entity only once in lmdb #12036

perf(clustering): store entity only once in lmdb #12036

Conversation

bungle commented Nov 15, 2023 • edited Loading

Summary

Checklist

bungle commented Nov 15, 2023 • edited Loading

chronolaw commented Nov 16, 2023

bungle commented Nov 16, 2023 • edited Loading

chronolaw commented Nov 16, 2023

chronolaw commented Nov 16, 2023

bungle commented Nov 16, 2023 • edited Loading

bungle commented Nov 16, 2023 • edited Loading

bungle commented Nov 16, 2023 • edited Loading

hbagdi commented Nov 16, 2023

bungle commented Nov 15, 2023 •

edited

Loading

bungle commented Nov 15, 2023 •

edited

Loading

bungle commented Nov 16, 2023 •

edited

Loading

bungle commented Nov 16, 2023 •

edited

Loading

bungle commented Nov 16, 2023 •

edited

Loading

bungle commented Nov 16, 2023 •

edited

Loading