Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate performance of MariaDB in host vs container mode #11566

Closed
amaltaro opened this issue Apr 19, 2023 · 15 comments
Closed

Evaluate performance of MariaDB in host vs container mode #11566

amaltaro opened this issue Apr 19, 2023 · 15 comments

Comments

@amaltaro
Copy link
Contributor

amaltaro commented Apr 19, 2023

Impact of the new feature
WMAgent

Is your feature request related to a problem? Please describe.
As part of running WMAgent in a container environment, composed with database containers as well. We need to perform load/stress tests to evaluate the performance of MariaDB container.

Describe the solution you'd like
Come up with a reliable and meaningful setup to evaluate the performance (latency and throughput, etc) of MariaDB in two deployment modes:

To be provided with this issue:

  • results from the evaluation and a final decision on whether it suits WMAgent needs
  • perhaps a set of scripts to be persisted in the repository, such that we can re-use them in the future

Describe alternatives you've considered
None

Additional context
Depends on: #11313
Part of the following meta issue: #11314

@todor-ivanov
Copy link
Contributor

Hi @khurtado,

I forgot to add here the two lines of extra clarification, which I promised to do during the WMCore meeting. Here they are:

  • I'd not expect to see much of a performance penalty in terms of communication between the database and wmagent component, when we communicate through the loop back device instead of a shared socket (as it was before). So we could be more relaxed on this part of the testing.
  • And I just answered my own hesitation about resource availability to the docker engine, by reading the standard documentation: https://docs.docker.com/config/containers/resource_constraints/

So no extra resource constraints are applied unless explicitly set with the docker run command. Which we for sure are not doing (we are not shooting ourselves in the foot in that ragards) So I think we are good.

@vkuznet
Copy link
Contributor

vkuznet commented Apr 22, 2024

I think this issue requires further clarification. What do you want to measure as performance and how. The MariaDB is a database, therefore someone should access it, and

  • do you measure access to database from a client
    • what is a client
      • is it WMCore codebase/component, or shell script
        • if it is WMCore please point which code/component should be used
        • if it is script, please provide how we inject, via mysql command (part of bash) or via python (which python module, which python version)?
  • should we use Mysql benchmark tool or anything else
  • it is possible that performance will depend on available RAM, in this case how much RAM to allocate for testing
  • it is possible that performance will depend on DB size, do we need to perform benchmark for empty DB or populate it with some content
    • if later, which data should be used?
  • what DB schema should be used
  • what performance metrics you want to see, number of reads or injections or both?

@khurtado
Copy link
Contributor

Using WMAgent as the client is tricky and doesn't allow for much flexibility while performing the tests. I was thinking on using sysbench, which is part of the benchmarking tools mentioned in mariadb-tools.

In that scenario, the idea would be the following:

  • Deploy MariaDB in container
  • Run sysbench benchmarks from host connecting to docker MariaDB
  • Deploy MariaDB the old way (through RPMs)
  • Run sysbench benchmarsk from host connecting to host

Compare 1 and 2

As for benchmarks:

All the Online Transaction Processing (OLTP) tests:
https://github.com/akopytov/sysbench/tree/master/src/lua

And likely, the fileIO, CPU and memory tests.
There would be multiple repeated tests while varying the DB size. 1G, 10G, 30G? @amaltaro Any feedback on the DB sizes that we should aim for?

If there is significant differences in performance, vary the resource limits for docker (memory, cpu access) for further testing.

@amaltaro
Copy link
Contributor Author

@khurtado Kenyi, apologies for missing this question.

Yes, I agree that we should use any benchmark tools that we can use, instead of plugging WMAgent to it (it would be too cumbersome to use WMAgent for that).

I had a look at one of the FNAL agents, and here are some database stats.
Where data actually resides:

[cmsdataops@cmsgwms-submit4 current]$ ls -l install/mysql/database/mysqld-bin.* | awk '{ sum += $5 } END{ print sum }'
385080226066
### which is around 360GB

and 51GB under the wmagent folder:

[cmsdataops@cmsgwms-submit4 current]$ du -sh install/mysql/database/wmagent (frm and ibd files)
51G	install/mysql/database/wmagent

I guess one of those are the database indexes, while the other contains the actual data. I am not sure which one is what.

@khurtado
Copy link
Contributor

khurtado commented Apr 29, 2024

@amaltaro Thank you! Looking into the documentation, it seems the -bin. are the binary logfiles, which are readable through mariadb-binlog. The frm/ibd files are the database files, so I will likely use 5, 25, 50, 75GB database sizes for the tests (although I will consider the RAM memory of the host as well).

https://mariadb.com/kb/en/mariadb-binlog/

@anpicci anpicci self-assigned this Apr 29, 2024
@khurtado
Copy link
Contributor

khurtado commented Apr 29, 2024

@amaltaro Should we test on a host that is fairly similar in resources w.r.t a production host? Or should we assume the performance difference would be similar to that of a testbed host when the DB size is scaled down proportionally? E.g.: vocms0265 has 6G RAM memory, so I started with databases with 10G (bigger than the available memory). Also @todor-ivanov , opinions on this?

I'm inclined to think that if we choose e.g.: a file size 50-100% bigger than the available RAM, the results should extrapolate (as in, DB=10G for a 6G memory node should be similar to DB=100G for a 60G memory node), but I wanted to make sure we are on the same page before continuing.

(MariaDB-10.6.5) [cmst1@vocms0265:data]$ free -m
              total        used        free      shared  buff/cache   available
Mem:           7138        2355         633         384        4149        4102
Swap:             0           0           0
(MariaDB-10.6.5) [cmst1@vocms0265:data]$ exit
exit
cmst1@vocms0265:10GB-host $ free -m
              total        used        free      shared  buff/cache   available
Mem:           7138        2347         641         384        4150        4110
Swap:             0           0           0

@amaltaro
Copy link
Contributor Author

@khurtado Farrukh just confirmed to me that the new EL9 testbed node (cmssrv810 at FNAL) uses SSD disks for /data. However, it just occurred to me as well that you would not be able to deploy the RPM version.

If you feel like you can perform this test by the end of this week, I would suggest using cmsgwms-submit3. Which is a production node at FNAL and that will reflect the type of resource we will have on EL9 soon. What do you think?

@todor-ivanov
Copy link
Contributor

Hi @khurtado I think, even though it is a single measurement, if you observe no (or almost no) difference between the container and host deployed database in regards to memory consumption, is already telling us we should feel quite safe. I do not expect any stressful situation even in a machine with a bigger hardware profile. It is up to you and @amaltaro to decide whether to go with a bigger machine for the tests of this container. To me the results would be quite satisfying even if we get them from a testbed machine at CERN as well.

@khurtado
Copy link
Contributor

@todor-ivanov That's a good point, thank you!
@amaltaro The problem with submit3 is that I don't see docker installed. I don't have an easy way to install sysbench either (i.e.: no root access). Do you know who could help with that?

@khurtado
Copy link
Contributor

khurtado commented May 1, 2024

@amaltaro @anpicci @todor-ivanov I have completed almost all of the benchmark tests. I am missing one test: File I/O with a total file size of 200 GB. I will update the repository with it, but I do not expect significant differences compared to the 50G tests. I still want to put it because 50G is actually "small" considering we count with +100G of RAM in the FNAL node.

Results can be seen here:

https://gitlab.cern.ch/dmwm/wmcore-docs/-/blob/master/docs/wmcore/MariaDB-benchmark-tests.md

@anpicci Could you complete the documentation with more detailed conclusions from the results?

We still need to put more detailed conclusions, but I do see significant performance differences from the Online Transaction Processing benchmark tests (+ 4x slower from docker vs the host) . At first, I thought it could be because the traditional host-based MariaDB instance works through a UNIX socket, and for the container, even though it creates a socket as well, we have to connect in TCP mode to 127.0.0.1:3306. However, I made some minor modifications to expose the UNIX socket from the container to the host to run the benchmarks in socket mode, but did not see a huge improvement.

@khurtado
Copy link
Contributor

khurtado commented May 1, 2024

@amaltaro If sysbench is installed in cmssrv810, we could run the OLTP benchmarks there. I don't think we can compare apples to apples in that sense since that host has different hardware (e.g.: SSD disk), but at least we could compare how better or worse we are in comparison to the current host-based EL7 based performance.
There are things like the innodb flush method that MariaDB 10.6.5 opted to by default to improve performance with newer kernels (5.x?) that are likely slower in linux kernels 3.x (CentOS7) but may be good in Alma9 (linux 5.14), so the hardware wouldn't be the only difference.

@amaltaro
Copy link
Contributor Author

amaltaro commented May 1, 2024

@khurtado Kenyi, that is a good point! Please email HyunWoo with this request.
Otherwise, we have a brand new node that Bockjoo is commissioning for us at CERN, vocms265. Feel free to install it there and run such tests.

@khurtado
Copy link
Contributor

khurtado commented May 2, 2024

I just emailed Hyun Woo. vocms265 only has 6Gb of memory but it also has a SSD disk, so I feel the FNAL node is the best option, as the cpu/memory is closer to the production nodes.

EDIT:
@amaltaro @anpicci I updated the OLTP benchmark tests with Alma9 results from docker. In both loop back and socket mode (also: single thread and multithread for each). Results seem to be better for some tests and worse for others in comparison to the RHEL7 docker tests, which are still worse than the rhel7 host-based version.

@anpicci
Copy link
Contributor

anpicci commented May 2, 2024

FYI @amaltaro @khurtado , I am reporting the results provided by Kenyi to the official documentation

@khurtado
Copy link
Contributor

khurtado commented May 3, 2024

@anpicci Thank you for the documentation!

Since the benchmark tests are done, I am closing this ticket, but I opened a new one to follow up and see if we can improve the performance:

#11977

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

5 participants