Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add helm charts to Harmony #116

Open
bmtcril opened this issue Feb 8, 2024 · 1 comment
Open

Add helm charts to Harmony #116

bmtcril opened this issue Feb 8, 2024 · 1 comment
Assignees
Labels

Comments

@bmtcril
Copy link
Contributor

bmtcril commented Feb 8, 2024

To support scalable deployments of the Aspects infrastructure, we would like to add the EduNEXT production helm charts to the Harmony project. Specifically these would support:

  • Adding a version of the ClickHouse Operator helm chart for running ClickHouse in a scalable clustered mode
  • Celery settings for Aspects
  • Autoscaling for Ralph and Superset
@Ian2012
Copy link
Contributor

Ian2012 commented Sep 20, 2024

Autoscaling

Autoscaling can be implemented using tutor-contrib-pod-autoscaling:

from tutorpod_autoscaling.hooks import AUTOSCALING_CONFIG

@AUTOSCALING_CONFIG.add()
def _add_my_autoscaling(autoscaling_config):
    autoscaling_config["ralph"] = {
        "enable_hpa": True,
        "memory_request": "300Mi",
        "cpu_request": 0.25,
        "memory_limit": "1200Mi",
        "cpu_limit": 1,
        "min_replicas": 1,
        "max_replicas": 10,
        "avg_cpu": 300,
        "avg_memory": "",
        "enable_vpa": False,
    }
    autoscaling_config["superset"] = {
        "enable_hpa": True,
        "memory_request": "300Mi",
        "cpu_request": 0.25,
        "memory_limit": "1200Mi",
        "cpu_limit": 1,
        "min_replicas": 1,
        "max_replicas": 10,
        "avg_cpu": 300,
        "avg_memory": "",
        "enable_vpa": False,
    }
    return autoscaling_config

For the actual values, we can reference the Ralph Helm Chart and the Superset Helm Chart. We don't use the superset workers extensively, but it would be a good addition to have autoscaling values for it too.

Celery

The default celery workers are run using a process pool that assumes all tasks are CPU intensive, however, Aspects tasks are mainly I/O bound, as they perform either a call or set of calls to Redis (for batching) or to Ralph (which makes another call to ClickHouse) and are most of the time CPU idle. At edunext, we have developed a tutor Celery plugin to manage multiple queues for Celery. With it, we have tested switching to a gevent pool which uses lightweight threads on the default lms worker deployment with concurrency set to 100 events. It improved a lot the performance of the tasks.

The plan would be:

  • Add gevent as a dependency of edx-platform.
  • Add notes for scaling and how to configure it for Aspects tasks such as how to improve the performance of Aspects tasks, manage multiple celery queues, and have a dedicated queue for Aspects.

ClickHouse

Support for the ClickHouse operator will be added to Harmony, and examples with documentation for running on production with Aspects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Ready for Work
Development

No branches or pull requests

2 participants