Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XXX feat(controller): permit setting GUNICORN_WORKERS #75

Closed

Conversation

kingdonb
Copy link
Member

permit setting GUNICORN_WORKERS and CONN_MAX_AGE from global values.yaml

(an additional pull request to the umbrella 'workflow' chart will follow)

@kingdonb
Copy link
Member Author

did I tag this correctly? it's a charts-only change...

@kingdonb
Copy link
Member Author

... it's not charts-only anymore

@Cryptophobia
Copy link
Member

Cryptophobia commented Sep 4, 2018

This looks good. Have you tested it by spinning up a controller with your changes connected to the database?

Looks good but we should test it just in case the charts broke.

You should be able to package the controller charts that have changed. Define a values.yml and pass in the GUNICORN_WORKERS and CONN_MAX_AGE defined in that file :

$ helm package .
$ helm install ${controller_charts_tgz_package_name} --namespace deis --values value.yml
$ helm fetch hephy/database
$ helm install database-v2.6.0.tgz --namespace deis

@kingdonb
Copy link
Member Author

kingdonb commented Sep 4, 2018

I will give it a shot and report back. Thanks for the guidance!

@kingdonb
Copy link
Member Author

kingdonb commented Sep 5, 2018

Hmm, that might have worked in a vacuum, but I tried upgrading my existing deployment and I ran into a number of issues.

For one thing, slugrunner-config appears to be a dependency of controller (did not know that)

Also noticed that doing a deploy this way without going through deisrel, directly from a branch of the git repo, still reaches out to quay.io/deisci/controller:canary -- that's not the right place to pull from! Oops...

I have a feeling that the patch is fine, but I have a homework assignment now and will take some time to verify it. Meanwhile I noticed this one thing missing that should really be added... ade2319

@Cryptophobia
Copy link
Member

@kingdonb It should be best done to deploy Hephy Workflow 2.19.4 on minikube and then do the helm commands above. You will also need to build a new Docker image and push it somewhere and update the charts to pull that custom image that contains the change.

Also the build is failing and I restarted it. Let's see...

@Cryptophobia
Copy link
Member

Hey @kingdonb , do you think you will have some time to take a look at this again?

@kingdonb
Copy link
Member Author

Yes, I would like to... but I don't know if I will be able to get to it before 11 days from now, when the Open Roadmap is scheduled. I don't know if someone else can test it?

I'm setting a reminder to check on this in 4 days, hopefully when that comes I'll be able to spend the time and test it. I just can't right now.

@Cryptophobia
Copy link
Member

Cryptophobia commented Oct 21, 2018

I can also try to test PR at some point if I can get to it, but I will leave it to you for now as there are some other things that are more pending.

@kingdonb
Copy link
Member Author

I'm going to get around to this, let's say by Wednesday

If I don't, then you can rib on me Thursday at the Open Roadmap meeting.

I also said I would look at teamhephy/workflow-e2e#18

@Cryptophobia
Copy link
Member

Cryptophobia commented Oct 28, 2018

Whenever you have some free time to devote to this @kingdonb . I know you have been busy. 🐝

EDIT: I am assuming this has to mostly just be tested and as long as it doesn't break any existing functionality it should be good to merge in.

@kingdonb
Copy link
Member Author

This time it passed the tests. I still haven't manually tried all four of the examples I wrote just to be sure... maybe tomorrow.

# Setting the value to "None" will ensure that connections are never timed out.
# - 0
# - 0
# - 600
# - "None"
# conn_max_age: 600

# Setting the value to "None" will ensure that connections are never timed out.
# - 0
# - 600
# - "None"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the default value of conn_max_age when no value is set in the chart? Sounds like it will be "None" by default... Is that the correct behaviour we want?

@Cryptophobia
Copy link
Member

OK, let me know when it has been tested sufficiently and can be merged.

@duanhongyi
Copy link
Member

duanhongyi commented Nov 1, 2018

Judging from the original intention of Docker, it does not recommend running multiple processes in a single container.

So I suggest using Scale instead of multi process.

@Cryptophobia @kingdonb

@Cryptophobia
Copy link
Member

Judging from the original intention of Docker, it does not recommend running multiple processes in a single container.

So I suggest using Scale instead of multi process.

@Cryptophobia @kingdonb

I think in the original issue, the problem was that the database connections were not being closed when running connections exceeded the maximum value of GUNICORN_WORKERS. The controller is the main client of the database and we should be able to fine-grain control how many worker processes it can create on systems with lots of CPUs:

There should not be possible to run more gunicorn workers than max_connections in deis-database.

Yes, scaling containers is better than scaling the processes generally with container technologies. But here in this example we are not totally in charge of how Django uses Gunicorn, and so here we are dealing with the Gunicorn Server Architecture Model. I think here we are correct in how we are fine-tuning the worker threads with these two variables. Gunicorn Server Model. This is very similar to how other process managers work - Resque-pool, Unicorn, Puma, PM2, and nginx to name a few. This concurrent multi-threading of worker processes is actually a benefit inside containers as it saves memory. If we have two threads and one master process, one thread sleeps while one thread processes a request. The general benefit is that both threads will share memory and resources passed down from the master thread. In the very least, they will share the dynamic libraries that they use as well as the memory that they reserve to process requests.

@duanhongyi
Copy link
Member

Agree with you.

Python has globally interpreted locks, so the threading model is not a good idea; gevent / event is a good solution in high IO situations. I wanted to use Gunicorn + gevent to run the controller, but then I thought it might not be necessary because the controller should not be a performance bottleneck.

@Cryptophobia
Copy link
Member

Cryptophobia commented Nov 1, 2018

Python has globally interpreted locks, so the threading model is not a good idea; gevent / event is a good solution in high IO situations.

Yes, exactly. Python GIL is part of the interpreter and there is no way around it. Controller is still IO bound and thus CPU is still not the deciding factor in how many requests can be handled without being blocked. If users with many CPU machines want to run the controller they should be able to scale the gunicorn workers and fine-tune them as this would normally be useful for them for maximizing concurrency and parallelism of these threads.

Good article on how the different types of workers behave in gunicorn.

@@ -24,8 +24,6 @@
'security.W008'
]

CONN_MAX_AGE = 60 * 3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be added back. I think this is a different CONN_MAX_AGE env variable?

@Cryptophobia
Copy link
Member

Hi @kingdonb , have you tested this? What about setting the GUNICORN_WORKERS env variable or is that captured by python at runtime?

Kingdon Barrett added 4 commits November 22, 2020 21:40
and CONN_MAX_AGE from global values.yaml
The controller values.yaml should mention all of the settings that can
affect the controller in values.yaml too.
@kingdonb kingdonb force-pushed the gunicorn-workers-conn-max-age branch from a965de8 to 4f93b5d Compare November 23, 2020 02:41
@kingdonb kingdonb changed the title feat(controller): permit setting GUNICORN_WORKERS XXX feat(controller): permit setting GUNICORN_WORKERS Nov 23, 2020
@kingdonb
Copy link
Member Author

Building this image, rebased on the latest controller, gave me a headache (CrashLoopBackOff)

$ k logs -f deis-controller-85b86bfc9f-4xn6s
system information:
Django Version: 1.11.29
Python 3.6.9

Django checks:
System check identified no issues (2 silenced).

Health Checks:
Checking if database is alive
There was a problem connecting to the database
unsupported operand type(s) for +: 'float' and 'str'

I have no idea what caused this, it doesn't look like it's related to my change, but I submitted a separate PR with chart-only change that gets it done. We already had the gunicorn_workers code merged into controller python code. It only needed to be set as an environment variable from the chart/deployment spec.

The loose end in here is the CONN_MAX_AGE thing. I'll keep it open until we figure out if there's a change needed for that.

@Cryptophobia
Copy link
Member

Cryptophobia commented Nov 23, 2020

Seem to be failing at django.db.connection.cursor() in file https://github.com/teamhephy/controller/blob/master/rootfs/api/management/commands/healthchecks.py#L13 .

Do you have some type of user and environment string with a special character. We must be breaking somewhere with the connection string to the db?

kingdonb pushed a commit to kingdonb/controller that referenced this pull request Nov 27, 2020
teamhephy#75 (review)

>  I think this is a different CONN_MAX_AGE env variable?

I don't see where else this variable is used, but no other change is
present which could have caused the tests to fail ?
@kingdonb kingdonb mentioned this pull request Nov 27, 2020
@kingdonb
Copy link
Member Author

Close in favor of #139

@kingdonb kingdonb closed this Nov 27, 2020
duanhongyi added a commit to duanhongyi/controller that referenced this pull request Apr 24, 2023
chore(oauth2): update user info pipline
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants