-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluate performance of MariaDB in host vs container mode #11566
Comments
Hi @khurtado, I forgot to add here the two lines of extra clarification, which I promised to do during the WMCore meeting. Here they are:
So no extra resource constraints are applied unless explicitly set with the |
I think this issue requires further clarification. What do you want to measure as performance and how. The MariaDB is a database, therefore someone should access it, and
|
Using WMAgent as the client is tricky and doesn't allow for much flexibility while performing the tests. I was thinking on using sysbench, which is part of the benchmarking tools mentioned in mariadb-tools. In that scenario, the idea would be the following:
Compare 1 and 2 As for benchmarks: All the Online Transaction Processing (OLTP) tests: And likely, the fileIO, CPU and memory tests. If there is significant differences in performance, vary the resource limits for docker (memory, cpu access) for further testing. |
@khurtado Kenyi, apologies for missing this question. Yes, I agree that we should use any benchmark tools that we can use, instead of plugging WMAgent to it (it would be too cumbersome to use WMAgent for that). I had a look at one of the FNAL agents, and here are some database stats.
and 51GB under the
I guess one of those are the database indexes, while the other contains the actual data. I am not sure which one is what. |
@amaltaro Thank you! Looking into the documentation, it seems the -bin. are the binary logfiles, which are readable through mariadb-binlog. The frm/ibd files are the database files, so I will likely use 5, 25, 50, 75GB database sizes for the tests (although I will consider the RAM memory of the host as well). |
@amaltaro Should we test on a host that is fairly similar in resources w.r.t a production host? Or should we assume the performance difference would be similar to that of a testbed host when the DB size is scaled down proportionally? E.g.: vocms0265 has 6G RAM memory, so I started with databases with 10G (bigger than the available memory). Also @todor-ivanov , opinions on this? I'm inclined to think that if we choose e.g.: a file size 50-100% bigger than the available RAM, the results should extrapolate (as in, DB=10G for a 6G memory node should be similar to DB=100G for a 60G memory node), but I wanted to make sure we are on the same page before continuing.
|
@khurtado Farrukh just confirmed to me that the new EL9 testbed node (cmssrv810 at FNAL) uses SSD disks for /data. However, it just occurred to me as well that you would not be able to deploy the RPM version. If you feel like you can perform this test by the end of this week, I would suggest using cmsgwms-submit3. Which is a production node at FNAL and that will reflect the type of resource we will have on EL9 soon. What do you think? |
Hi @khurtado I think, even though it is a single measurement, if you observe no (or almost no) difference between the container and host deployed database in regards to memory consumption, is already telling us we should feel quite safe. I do not expect any stressful situation even in a machine with a bigger hardware profile. It is up to you and @amaltaro to decide whether to go with a bigger machine for the tests of this container. To me the results would be quite satisfying even if we get them from a testbed machine at CERN as well. |
@todor-ivanov That's a good point, thank you! |
@amaltaro @anpicci @todor-ivanov I have completed almost all of the benchmark tests. I am missing one test: File I/O with a total file size of 200 GB. I will update the repository with it, but I do not expect significant differences compared to the 50G tests. I still want to put it because 50G is actually "small" considering we count with +100G of RAM in the FNAL node. Results can be seen here: https://gitlab.cern.ch/dmwm/wmcore-docs/-/blob/master/docs/wmcore/MariaDB-benchmark-tests.md @anpicci Could you complete the documentation with more detailed conclusions from the results? We still need to put more detailed conclusions, but I do see significant performance differences from the Online Transaction Processing benchmark tests (+ 4x slower from docker vs the host) . At first, I thought it could be because the traditional host-based MariaDB instance works through a UNIX socket, and for the container, even though it creates a socket as well, we have to connect in TCP mode to 127.0.0.1:3306. However, I made some minor modifications to expose the UNIX socket from the container to the host to run the benchmarks in socket mode, but did not see a huge improvement. |
@amaltaro If sysbench is installed in cmssrv810, we could run the OLTP benchmarks there. I don't think we can compare apples to apples in that sense since that host has different hardware (e.g.: SSD disk), but at least we could compare how better or worse we are in comparison to the current host-based EL7 based performance. |
@khurtado Kenyi, that is a good point! Please email HyunWoo with this request. |
I just emailed Hyun Woo. vocms265 only has 6Gb of memory but it also has a SSD disk, so I feel the FNAL node is the best option, as the cpu/memory is closer to the production nodes. EDIT: |
Impact of the new feature
WMAgent
Is your feature request related to a problem? Please describe.
As part of running WMAgent in a container environment, composed with database containers as well. We need to perform load/stress tests to evaluate the performance of MariaDB container.
Describe the solution you'd like
Come up with a reliable and meaningful setup to evaluate the performance (latency and throughput, etc) of MariaDB in two deployment modes:
To be provided with this issue:
Describe alternatives you've considered
None
Additional context
Depends on: #11313
Part of the following meta issue: #11314
The text was updated successfully, but these errors were encountered: