-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Largest Supported Messages/Second Inputted and Optimal Worker Connector Configurations #2
Comments
Please do keep us posted on your findings. I have not spent much time running on MSK Connect yet. |
And by the way, have you experimented with the "global.throttle.ms" configuration setting? |
Regarding the "global.throttle.ms" setting, I've tested with leaving it empty and then attempting to set a throttle, but the data generator does not reach the levels of the throttle. Granted, I have not tried this as I ended up shifting to a different Voluble workaround, but we eventually decided to switch away as we were looking for something with significantly higher throughput, in this case simply using Voluble and setting our bootstrap servers within some of the executables in the bin folder. Regarding the connector failing, it usually took nearly thirty minutes in some instances of trying configurations with high MCUs and workers before it failed, and we were unable to fully utilize the cluster size we allocated. Will update if I have any further attempts with the MSK Data Generator |
Can anybody tell me how to run this amazon-msk-data-generator. In this case I didnt find any main class to execute |
@tmcgrath , when you have time review the pull request I made 😄 . Just moving away from Random to SplittableRandom helps a lot in this kind of scenarios |
This has been merged. Also, agreed with original post from @wesleytong we need more docs or code updates for higher throughput. Under 2,000 msgs a second as described above isn't much. Ideas or suggestions would be much appreciated. |
I have been able to push more than 200,000 msg/sec in my laptop via Java producers before (with the SplitableRandom approach). Let me dig a little bit more on this, will post it here my results. |
I have been able to replicate the behaviour, using |
I am currently trying out the MSK Data Generator as an alternative to Voluble thanks to its direct connection to AWS MSK. However, I'm noticing that the data generated/second seems to be slower than what we were getting locally on our Voluble tests. We were wondering what the maximum supported messages/second is estimated to be, and if this Data Generator is optimized similarly to Voluble.
For reference, on a cluster with the broker type of kafka.m5.2xlarge and 2 brokers (over 2 AZs), we are getting the following as our messages generated:
Autoscaled:
With 6 workers and 1 MCU per, ~1375 messages per second
With 3 workers and 2 MCU per, ~1500 messages per second
With 7 workers and 1 MCU per, ~1464 messages per second
Provisioned:
With 6 workers and 1 MCU per, ~1700 messages per second.
Obviously, this seems quite underutilized and we were wondering if there was something wrong with the connector configuration or general optimization. We are aiming to try to generate 1 million messages per second, and are unsure if that is achievable even with a kafka.m5.24xlarge broker size.
Additionally, the connectors seem to fail when we try to modify the connector capacity, MCU count/worker, and worker count, we also notice that the connector fails. There's no documentation available regarding this optimization, and I'd love to contribute my findings to help, but I was wondering if there is any known optimal value set. I've noticed that maxing the connector capacity to that of the broker size results in the failure of the connector (ie m5.2xlarge supports up to 32 GB and 8 vCPUs but setting a configuration such as 4 workers and 2 MCUs per results in the connector failing, while 3 workers and 2 MCUs works fine where each MCU needs one vCPU and 4 GB of RAM). I've also encountered issues with a m5.24xlarge creating any connectors at all, where the connector eventually transitions into a failed state in both provisioned and autoscaled capacity types.
Would love to know if there is something wrong with our optimization or our connector usage and/or if there has been any testing done with a similar set up, as we have followed the deployment steps found within this repository here.
The text was updated successfully, but these errors were encountered: