Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevents the scheduler to store task's data #15

Open
rafa-be opened this issue Sep 3, 2024 · 3 comments
Open

Prevents the scheduler to store task's data #15

rafa-be opened this issue Sep 3, 2024 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@rafa-be
Copy link
Collaborator

rafa-be commented Sep 3, 2024

Scheduler currently stores the task's data until it finishes. This basically doubles the memory usage as the data is stored by the worker too.

We would like to add a new mode to the scheduler in which it does not store the task data.

This causes two issues:

  • if the scheduler wishes to balance a task, it should request the task data from the worker;
  • if the worker dies, the scheduler will not be able to reschedule the task and should return a task failure exception to the client.
@rafa-be rafa-be self-assigned this Sep 3, 2024
@sharpener6
Copy link
Collaborator

It should have 2 modes:

  • store the task in scheduler
  • once it routes the task to worker, then delete it from scheduler's memory, in this situation
    • if worker died, this task should be marked as failed and send the result back with exception WorkerDied
    • if rebalancing tasks, scheduler should issue TaskCancel with retrive_task_flag=True, once it get the task back from worker then it can schedule it to other worker (might create some overheads if task is big)

@sharpener6 sharpener6 added the enhancement New feature or request label Sep 27, 2024
@rafa-be
Copy link
Collaborator Author

rafa-be commented Sep 30, 2024

Isn't there some overlap with this issue: #23 ?

@sharpener6
Copy link
Collaborator

sharpener6 commented Oct 10, 2024

@rafa-be

I closed #23 as I cannot mark it as duplicate issue

Currently, scaler scheduler is keeping the task object in the memory so if worker died, scheduler will allocate the task to other workers, but for memory efficiency, we will need have a mode that scheduler doesn't keep the task once sent to worker, so there are some behavior changes when this

keep task:

when task failed due to worker get disconnected, scheduler will reassign to another worker to do
when balancing tasks, scheduler just need task ids from busy worker and send to other workers

do not keep task:

when task failed due to worker get disconnected, scheduler will just return failed result to Client
when balancing tasks, because scheduler doesn't have task content at all, so it will ask busy workers to return not only task ids, but also task contents so it can reschedule to other workers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants