Evaluate performance of MariaDB in host vs container mode #11566

amaltaro · 2023-04-19T14:51:09Z

Impact of the new feature
WMAgent

Is your feature request related to a problem? Please describe.
As part of running WMAgent in a container environment, composed with database containers as well. We need to perform load/stress tests to evaluate the performance of MariaDB container.

Describe the solution you'd like
Come up with a reliable and meaningful setup to evaluate the performance (latency and throughput, etc) of MariaDB in two deployment modes:

deploying MariaDB directly on top of the OS (through the cmsdist RPMs)
deploying MariaDB through a Docker image (result from WMAgent: install/run MariaDB from Dockerhub #11313)

To be provided with this issue:

results from the evaluation and a final decision on whether it suits WMAgent needs
perhaps a set of scripts to be persisted in the repository, such that we can re-use them in the future

Describe alternatives you've considered
None

Additional context
Depends on: #11313
Part of the following meta issue: #11314

todor-ivanov · 2024-04-16T11:25:02Z

Hi @khurtado,

I forgot to add here the two lines of extra clarification, which I promised to do during the WMCore meeting. Here they are:

I'd not expect to see much of a performance penalty in terms of communication between the database and wmagent component, when we communicate through the loop back device instead of a shared socket (as it was before). So we could be more relaxed on this part of the testing.
And I just answered my own hesitation about resource availability to the docker engine, by reading the standard documentation: https://docs.docker.com/config/containers/resource_constraints/

So no extra resource constraints are applied unless explicitly set with the docker run command. Which we for sure are not doing (we are not shooting ourselves in the foot in that ragards) So I think we are good.

vkuznet · 2024-04-22T16:58:15Z

I think this issue requires further clarification. What do you want to measure as performance and how. The MariaDB is a database, therefore someone should access it, and

do you measure access to database from a client
- what is a client
  - is it WMCore codebase/component, or shell script
    - if it is WMCore please point which code/component should be used
    - if it is script, please provide how we inject, via mysql command (part of bash) or via python (which python module, which python version)?
should we use Mysql benchmark tool or anything else
it is possible that performance will depend on available RAM, in this case how much RAM to allocate for testing
it is possible that performance will depend on DB size, do we need to perform benchmark for empty DB or populate it with some content
- if later, which data should be used?
what DB schema should be used
what performance metrics you want to see, number of reads or injections or both?

khurtado · 2024-04-24T18:48:42Z

Using WMAgent as the client is tricky and doesn't allow for much flexibility while performing the tests. I was thinking on using sysbench, which is part of the benchmarking tools mentioned in mariadb-tools.

In that scenario, the idea would be the following:

Deploy MariaDB in container
Run sysbench benchmarks from host connecting to docker MariaDB

Deploy MariaDB the old way (through RPMs)
Run sysbench benchmarsk from host connecting to host

Compare 1 and 2

As for benchmarks:

All the Online Transaction Processing (OLTP) tests:
https://github.com/akopytov/sysbench/tree/master/src/lua

And likely, the fileIO, CPU and memory tests.
There would be multiple repeated tests while varying the DB size. 1G, 10G, 30G? @amaltaro Any feedback on the DB sizes that we should aim for?

If there is significant differences in performance, vary the resource limits for docker (memory, cpu access) for further testing.

amaltaro · 2024-04-29T16:14:21Z

@khurtado Kenyi, apologies for missing this question.

Yes, I agree that we should use any benchmark tools that we can use, instead of plugging WMAgent to it (it would be too cumbersome to use WMAgent for that).

I had a look at one of the FNAL agents, and here are some database stats.
Where data actually resides:

[cmsdataops@cmsgwms-submit4 current]$ ls -l install/mysql/database/mysqld-bin.* | awk '{ sum += $5 } END{ print sum }'
385080226066
### which is around 360GB

and 51GB under the wmagent folder:

[cmsdataops@cmsgwms-submit4 current]$ du -sh install/mysql/database/wmagent (frm and ibd files)
51G	install/mysql/database/wmagent

I guess one of those are the database indexes, while the other contains the actual data. I am not sure which one is what.

khurtado · 2024-04-29T17:53:33Z

@amaltaro Thank you! Looking into the documentation, it seems the -bin. are the binary logfiles, which are readable through mariadb-binlog. The frm/ibd files are the database files, so I will likely use 5, 25, 50, 75GB database sizes for the tests (although I will consider the RAM memory of the host as well).

https://mariadb.com/kb/en/mariadb-binlog/

khurtado · 2024-04-29T20:26:02Z

@amaltaro Should we test on a host that is fairly similar in resources w.r.t a production host? Or should we assume the performance difference would be similar to that of a testbed host when the DB size is scaled down proportionally? E.g.: vocms0265 has 6G RAM memory, so I started with databases with 10G (bigger than the available memory). Also @todor-ivanov , opinions on this?

I'm inclined to think that if we choose e.g.: a file size 50-100% bigger than the available RAM, the results should extrapolate (as in, DB=10G for a 6G memory node should be similar to DB=100G for a 60G memory node), but I wanted to make sure we are on the same page before continuing.

(MariaDB-10.6.5) [cmst1@vocms0265:data]$ free -m
              total        used        free      shared  buff/cache   available
Mem:           7138        2355         633         384        4149        4102
Swap:             0           0           0
(MariaDB-10.6.5) [cmst1@vocms0265:data]$ exit
exit
cmst1@vocms0265:10GB-host $ free -m
              total        used        free      shared  buff/cache   available
Mem:           7138        2347         641         384        4150        4110
Swap:             0           0           0

amaltaro · 2024-04-30T01:40:56Z

@khurtado Farrukh just confirmed to me that the new EL9 testbed node (cmssrv810 at FNAL) uses SSD disks for /data. However, it just occurred to me as well that you would not be able to deploy the RPM version.

If you feel like you can perform this test by the end of this week, I would suggest using cmsgwms-submit3. Which is a production node at FNAL and that will reflect the type of resource we will have on EL9 soon. What do you think?

todor-ivanov · 2024-04-30T08:34:25Z

Hi @khurtado I think, even though it is a single measurement, if you observe no (or almost no) difference between the container and host deployed database in regards to memory consumption, is already telling us we should feel quite safe. I do not expect any stressful situation even in a machine with a bigger hardware profile. It is up to you and @amaltaro to decide whether to go with a bigger machine for the tests of this container. To me the results would be quite satisfying even if we get them from a testbed machine at CERN as well.

khurtado · 2024-04-30T12:35:48Z

@todor-ivanov That's a good point, thank you!
@amaltaro The problem with submit3 is that I don't see docker installed. I don't have an easy way to install sysbench either (i.e.: no root access). Do you know who could help with that?

khurtado · 2024-05-01T14:56:52Z

@amaltaro @anpicci @todor-ivanov I have completed almost all of the benchmark tests. I am missing one test: File I/O with a total file size of 200 GB. I will update the repository with it, but I do not expect significant differences compared to the 50G tests. I still want to put it because 50G is actually "small" considering we count with +100G of RAM in the FNAL node.

Results can be seen here:

https://gitlab.cern.ch/dmwm/wmcore-docs/-/blob/master/docs/wmcore/MariaDB-benchmark-tests.md

@anpicci Could you complete the documentation with more detailed conclusions from the results?

We still need to put more detailed conclusions, but I do see significant performance differences from the Online Transaction Processing benchmark tests (+ 4x slower from docker vs the host) . At first, I thought it could be because the traditional host-based MariaDB instance works through a UNIX socket, and for the container, even though it creates a socket as well, we have to connect in TCP mode to 127.0.0.1:3306. However, I made some minor modifications to expose the UNIX socket from the container to the host to run the benchmarks in socket mode, but did not see a huge improvement.

khurtado · 2024-05-01T20:57:51Z

@amaltaro If sysbench is installed in cmssrv810, we could run the OLTP benchmarks there. I don't think we can compare apples to apples in that sense since that host has different hardware (e.g.: SSD disk), but at least we could compare how better or worse we are in comparison to the current host-based EL7 based performance.
There are things like the innodb flush method that MariaDB 10.6.5 opted to by default to improve performance with newer kernels (5.x?) that are likely slower in linux kernels 3.x (CentOS7) but may be good in Alma9 (linux 5.14), so the hardware wouldn't be the only difference.

amaltaro · 2024-05-01T21:45:03Z

@khurtado Kenyi, that is a good point! Please email HyunWoo with this request.
Otherwise, we have a brand new node that Bockjoo is commissioning for us at CERN, vocms265. Feel free to install it there and run such tests.

khurtado · 2024-05-02T00:39:52Z

I just emailed Hyun Woo. vocms265 only has 6Gb of memory but it also has a SSD disk, so I feel the FNAL node is the best option, as the cpu/memory is closer to the production nodes.

EDIT:
@amaltaro @anpicci I updated the OLTP benchmark tests with Alma9 results from docker. In both loop back and socket mode (also: single thread and multithread for each). Results seem to be better for some tests and worse for others in comparison to the RHEL7 docker tests, which are still worse than the rhel7 host-based version.

anpicci · 2024-05-02T10:45:22Z

FYI @amaltaro @khurtado , I am reporting the results provided by Kenyi to the official documentation

khurtado · 2024-05-03T18:19:43Z

@anpicci Thank you for the documentation!

Since the benchmark tests are done, I am closing this ticket, but I opened a new one to follow up and see if we can improve the performance:

#11977

amaltaro added New Feature scalability containerization Testing MariaDB and removed New Feature labels Apr 19, 2023

amaltaro mentioned this issue Apr 19, 2023

Deploy and run WMAgent with Docker container #11314

Open

36 tasks

khurtado self-assigned this Apr 8, 2024

anpicci self-assigned this Apr 29, 2024

khurtado mentioned this issue May 3, 2024

Investigate alternatives to increase performance on containerized MariaDB service #11977

Closed

khurtado closed this as completed May 3, 2024

amaltaro mentioned this issue Jun 6, 2024

MariaDB container 10.6.5 issues (slowness and deadlock) #12007

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate performance of MariaDB in host vs container mode #11566

Evaluate performance of MariaDB in host vs container mode #11566

amaltaro commented Apr 19, 2023 •

edited by todor-ivanov

Loading

todor-ivanov commented Apr 16, 2024

vkuznet commented Apr 22, 2024

khurtado commented Apr 24, 2024

amaltaro commented Apr 29, 2024

khurtado commented Apr 29, 2024 •

edited

Loading

khurtado commented Apr 29, 2024 •

edited

Loading

amaltaro commented Apr 30, 2024

todor-ivanov commented Apr 30, 2024

khurtado commented Apr 30, 2024

khurtado commented May 1, 2024 •

edited

Loading

khurtado commented May 1, 2024 •

edited

Loading

amaltaro commented May 1, 2024

khurtado commented May 2, 2024 •

edited

Loading

anpicci commented May 2, 2024

khurtado commented May 3, 2024

Evaluate performance of MariaDB in host vs container mode #11566

Evaluate performance of MariaDB in host vs container mode #11566

Comments

amaltaro commented Apr 19, 2023 • edited by todor-ivanov Loading

todor-ivanov commented Apr 16, 2024

vkuznet commented Apr 22, 2024

khurtado commented Apr 24, 2024

amaltaro commented Apr 29, 2024

khurtado commented Apr 29, 2024 • edited Loading

khurtado commented Apr 29, 2024 • edited Loading

amaltaro commented Apr 30, 2024

todor-ivanov commented Apr 30, 2024

khurtado commented Apr 30, 2024

khurtado commented May 1, 2024 • edited Loading

khurtado commented May 1, 2024 • edited Loading

amaltaro commented May 1, 2024

khurtado commented May 2, 2024 • edited Loading

anpicci commented May 2, 2024

khurtado commented May 3, 2024

amaltaro commented Apr 19, 2023 •

edited by todor-ivanov

Loading

khurtado commented Apr 29, 2024 •

edited

Loading

khurtado commented Apr 29, 2024 •

edited

Loading

khurtado commented May 1, 2024 •

edited

Loading

khurtado commented May 1, 2024 •

edited

Loading

khurtado commented May 2, 2024 •

edited

Loading