Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Largest Supported Messages/Second Inputted and Optimal Worker Connector Configurations #2

Open
wesleytong opened this issue Nov 2, 2021 · 8 comments

Comments

@wesleytong
Copy link

I am currently trying out the MSK Data Generator as an alternative to Voluble thanks to its direct connection to AWS MSK. However, I'm noticing that the data generated/second seems to be slower than what we were getting locally on our Voluble tests. We were wondering what the maximum supported messages/second is estimated to be, and if this Data Generator is optimized similarly to Voluble.

For reference, on a cluster with the broker type of kafka.m5.2xlarge and 2 brokers (over 2 AZs), we are getting the following as our messages generated:
Autoscaled:
With 6 workers and 1 MCU per, ~1375 messages per second
With 3 workers and 2 MCU per, ~1500 messages per second
With 7 workers and 1 MCU per, ~1464 messages per second

Provisioned:
With 6 workers and 1 MCU per, ~1700 messages per second.

Obviously, this seems quite underutilized and we were wondering if there was something wrong with the connector configuration or general optimization. We are aiming to try to generate 1 million messages per second, and are unsure if that is achievable even with a kafka.m5.24xlarge broker size.

Additionally, the connectors seem to fail when we try to modify the connector capacity, MCU count/worker, and worker count, we also notice that the connector fails. There's no documentation available regarding this optimization, and I'd love to contribute my findings to help, but I was wondering if there is any known optimal value set. I've noticed that maxing the connector capacity to that of the broker size results in the failure of the connector (ie m5.2xlarge supports up to 32 GB and 8 vCPUs but setting a configuration such as 4 workers and 2 MCUs per results in the connector failing, while 3 workers and 2 MCUs works fine where each MCU needs one vCPU and 4 GB of RAM). I've also encountered issues with a m5.24xlarge creating any connectors at all, where the connector eventually transitions into a failed state in both provisioned and autoscaled capacity types.

Would love to know if there is something wrong with our optimization or our connector usage and/or if there has been any testing done with a similar set up, as we have followed the deployment steps found within this repository here.

@tmcgrath
Copy link
Contributor

Please do keep us posted on your findings. I have not spent much time running on MSK Connect yet.

@tmcgrath
Copy link
Contributor

And by the way, have you experimented with the "global.throttle.ms" configuration setting?

@wesleytong
Copy link
Author

Regarding the "global.throttle.ms" setting, I've tested with leaving it empty and then attempting to set a throttle, but the data generator does not reach the levels of the throttle. Granted, I have not tried this as I ended up shifting to a different Voluble workaround, but we eventually decided to switch away as we were looking for something with significantly higher throughput, in this case simply using Voluble and setting our bootstrap servers within some of the executables in the bin folder.

Regarding the connector failing, it usually took nearly thirty minutes in some instances of trying configurations with high MCUs and workers before it failed, and we were unable to fully utilize the cluster size we allocated. Will update if I have any further attempts with the MSK Data Generator

@NarayanYerrabachu
Copy link

Can anybody tell me how to run this amazon-msk-data-generator. In this case I didnt find any main class to execute

@Neuw84
Copy link
Contributor

Neuw84 commented May 2, 2024

@tmcgrath , when you have time review the pull request I made 😄 . Just moving away from Random to SplittableRandom helps a lot in this kind of scenarios

@tmcgrath
Copy link
Contributor

tmcgrath commented May 7, 2024

This has been merged. Also, agreed with original post from @wesleytong we need more docs or code updates for higher throughput. Under 2,000 msgs a second as described above isn't much. Ideas or suggestions would be much appreciated.

@Neuw84
Copy link
Contributor

Neuw84 commented May 7, 2024

I have been able to push more than 200,000 msg/sec in my laptop via Java producers before (with the SplitableRandom approach). Let me dig a little bit more on this, will post it here my results.

@Neuw84
Copy link
Contributor

Neuw84 commented May 8, 2024

I have been able to replicate the behaviour, using global.throttle.ms:1 and task.max:2 I don't get much throughput. I have a docker-compose setup and here I can test easily, however I will need some time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants