Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add workflow stress test #23

Open
Cryptophobia opened this issue Mar 22, 2018 · 2 comments
Open

add workflow stress test #23

Cryptophobia opened this issue Mar 22, 2018 · 2 comments

Comments

@Cryptophobia
Copy link
Member

From @bacongobbler on June 9, 2016 22:43

cross-post of deis/deis#4037

@jchauncey has found some interesting problems when running significant load through deis. I'd like to see an automated version of this test (or similar) so that we can watch deis' performance over the course of future releases. We could even run this on the various providers and compare performance. 😄

Copied from original issue: deis/jenkins-jobs#100

@Cryptophobia
Copy link
Member Author

From @arschles on June 9, 2016 22:48

Related: deis/router#198

@Cryptophobia
Copy link
Member Author

From @jchauncey on June 10, 2016 16:20

so as it stands right now i can push a significant amount of requests through deis and not see any real degradation in performance. That being said we need to do a few other things besides just sending a lot of requests to router and ultimately to a simple go app.

My thoughts on this are still kind of cloudy but here is what I had in mind:

Get the data

Have telegraf send all metrics for e2e runs to a hosted influx system where we can collect long term meaningful metrics. This will allow us to spot trends and new problems more efficiently.

Regular e2e runs

Use the regular e2e runs to make sure we are within certain bounds performance wise. We should eventually hook up kapacitor scripts to alert us when an e2e run is outside of those params.

The load test

Setup a nightly job that runs on a normal size cluster (5 or so nodes) and it deploys apps which can simulate failures (returns non-200 response code), generate arbitrarily large response bodies, and maybe makes calls to other dependent services. We would then use the cli to arbitrarily scale those apps up and down while also doing simultaneous deploys and generating traffic. This would allow us to see how the system performs while apps are under load and the operator is using the system to respond.

My main concern is that during a high load event the controller can still receive requests to scale up/down to meet demand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant