Load Testing using 5 replicas

Specifications:

Regular sustained load of 6000 requests is handled well for a period of 30 minutes.
Load balancing ensures great throughput and each pod is being utilized optimally.
Minimal error rate observed of 0.02% and this time for the Authentication microservice requests.
Overall, average-throughput drops down from an initial 70 req/sec to approximately 18 req/sec.

Agg-response-times-5-120000-requests

Overall-response-times-5-120000-requests

Capture

The system handles 6000 requests comfortably, when we have 5 replicas set up for each microservice.
The requests aren't handled at the same time, rather there's some kind of sequential processing behavior.
One potential improvement for throughput could be to exploit async-await functionality for concurrent requests at the Gateway.
Slow throughput could be either due to the kubeadm load-balancer not routing requests efficiently, or the gateway causing a bottleneck.

Our system can handle a maximum consistent load of 6000 requests/minute for 30 minutes with 0.01% error rate and a throughput of 17.9 requests/sec.

Agg-response-times-5-120000-requests

Overall-response-times-5-120000-requests

We observed that our system was able to handle a load of 8600 requests/minute for a span of 60 secs with 0.012% error rate. When we increased the load, we were getting a significant error rate.
From the graphs, we can infer that our python API and registration API are taking maximum time to execute requests. For python API, we have used in-built libraries to implement our logic that is causing significant delays in processing requests. The registration API involves writing a chunk of user information to the database which is time consuming. Also, the synchronous behavior of REST API’s implemented for inter microservice communication is causing delays in our system and limiting the capacity of our system.
We are observing a low throughput value for 5 replica set as opposed to a 3 replica set. After inspection, we found out that as we are using only 3 worker nodes to test, each of 16 GB, all our microservices can use a maximum of 48 GB to serve the requests. As we have allocated 4 GB for one python microservice to enhance the performance and decrease the response time of this python microservice, the other microservices are scrambling for more memory and hence we are getting a low throughput value.

Note how the RAM usage for the weather-reporter (computation heavy microservice) is equally split over 5 replicas.