Largest Supported Messages/Second Inputted and Optimal Worker Connector Configurations #2

wesleytong · 2021-11-02T16:42:07Z

I am currently trying out the MSK Data Generator as an alternative to Voluble thanks to its direct connection to AWS MSK. However, I'm noticing that the data generated/second seems to be slower than what we were getting locally on our Voluble tests. We were wondering what the maximum supported messages/second is estimated to be, and if this Data Generator is optimized similarly to Voluble.

For reference, on a cluster with the broker type of kafka.m5.2xlarge and 2 brokers (over 2 AZs), we are getting the following as our messages generated:
Autoscaled:
With 6 workers and 1 MCU per, ~1375 messages per second
With 3 workers and 2 MCU per, ~1500 messages per second
With 7 workers and 1 MCU per, ~1464 messages per second

Provisioned:
With 6 workers and 1 MCU per, ~1700 messages per second.

Obviously, this seems quite underutilized and we were wondering if there was something wrong with the connector configuration or general optimization. We are aiming to try to generate 1 million messages per second, and are unsure if that is achievable even with a kafka.m5.24xlarge broker size.

Additionally, the connectors seem to fail when we try to modify the connector capacity, MCU count/worker, and worker count, we also notice that the connector fails. There's no documentation available regarding this optimization, and I'd love to contribute my findings to help, but I was wondering if there is any known optimal value set. I've noticed that maxing the connector capacity to that of the broker size results in the failure of the connector (ie m5.2xlarge supports up to 32 GB and 8 vCPUs but setting a configuration such as 4 workers and 2 MCUs per results in the connector failing, while 3 workers and 2 MCUs works fine where each MCU needs one vCPU and 4 GB of RAM). I've also encountered issues with a m5.24xlarge creating any connectors at all, where the connector eventually transitions into a failed state in both provisioned and autoscaled capacity types.

Would love to know if there is something wrong with our optimization or our connector usage and/or if there has been any testing done with a similar set up, as we have followed the deployment steps found within this repository here.

tmcgrath · 2021-12-13T14:13:16Z

Please do keep us posted on your findings. I have not spent much time running on MSK Connect yet.

tmcgrath · 2021-12-13T20:19:53Z

And by the way, have you experimented with the "global.throttle.ms" configuration setting?

wesleytong · 2021-12-13T22:29:26Z

Regarding the "global.throttle.ms" setting, I've tested with leaving it empty and then attempting to set a throttle, but the data generator does not reach the levels of the throttle. Granted, I have not tried this as I ended up shifting to a different Voluble workaround, but we eventually decided to switch away as we were looking for something with significantly higher throughput, in this case simply using Voluble and setting our bootstrap servers within some of the executables in the bin folder.

Regarding the connector failing, it usually took nearly thirty minutes in some instances of trying configurations with high MCUs and workers before it failed, and we were unable to fully utilize the cluster size we allocated. Will update if I have any further attempts with the MSK Data Generator

NarayanYerrabachu · 2023-09-22T14:59:29Z

Can anybody tell me how to run this amazon-msk-data-generator. In this case I didnt find any main class to execute

Neuw84 · 2024-05-02T15:57:56Z

@tmcgrath , when you have time review the pull request I made 😄 . Just moving away from Random to SplittableRandom helps a lot in this kind of scenarios

tmcgrath · 2024-05-07T16:22:51Z

This has been merged. Also, agreed with original post from @wesleytong we need more docs or code updates for higher throughput. Under 2,000 msgs a second as described above isn't much. Ideas or suggestions would be much appreciated.

Neuw84 · 2024-05-07T18:34:09Z

I have been able to push more than 200,000 msg/sec in my laptop via Java producers before (with the SplitableRandom approach). Let me dig a little bit more on this, will post it here my results.

Neuw84 · 2024-05-08T07:05:48Z

I have been able to replicate the behaviour, using global.throttle.ms:1 and task.max:2 I don't get much throughput. I have a docker-compose setup and here I can test easily, however I will need some time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Largest Supported Messages/Second Inputted and Optimal Worker Connector Configurations #2

Largest Supported Messages/Second Inputted and Optimal Worker Connector Configurations #2

wesleytong commented Nov 2, 2021

tmcgrath commented Dec 13, 2021

tmcgrath commented Dec 13, 2021

wesleytong commented Dec 13, 2021

NarayanYerrabachu commented Sep 22, 2023

Neuw84 commented May 2, 2024

tmcgrath commented May 7, 2024

Neuw84 commented May 7, 2024

Neuw84 commented May 8, 2024

Largest Supported Messages/Second Inputted and Optimal Worker Connector Configurations #2

Largest Supported Messages/Second Inputted and Optimal Worker Connector Configurations #2

Comments

wesleytong commented Nov 2, 2021

tmcgrath commented Dec 13, 2021

tmcgrath commented Dec 13, 2021

wesleytong commented Dec 13, 2021

NarayanYerrabachu commented Sep 22, 2023

Neuw84 commented May 2, 2024

tmcgrath commented May 7, 2024

Neuw84 commented May 7, 2024

Neuw84 commented May 8, 2024