Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Implementing an Automatic Re-Queuing and Status Transition Mechanism #3716

Open
flyingfang opened this issue Sep 11, 2024 · 7 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@flyingfang
Copy link
Contributor

What is the problem you're trying to solve

We can implement a mechanism that supports automatically re-queuing tasks that have entered the "Inqueue" status but remain unscheduled for an extended period of time. These tasks would then be transitioned to the “Pending” status.
This will help optimize the queue’s efficiency and ensure that tasks are processed in a timely manner.

When tasks in the scheduling queue remain unscheduled for an extended period due to factors such as affinity or resource fragmentation, they are occupying quota while blocking subsequent jobs from entering the queue.
The system can automatically release these resources to improve the allocation rate of the queue. This ensures that the system can more efficiently utilize its resources and process jobs in a timely manner.

Describe the solution you'd like

We can add a requeue action,to incorporate a re-enqueue process.
The main process in the entire action is to traverse all jobs in Pending and Inqueue status, as well as jobs with pending tasks. By a registerable function ssn.JobRequeueable, we determine whether the job needs to be re-queued.
Next, we will traverse the tasks that need to be re-queued, and call another registerable function ssn.JobRequeue, to implement the re-queueing of the tasks.

// Session information for the current session
type Session struct {
	UID types.UID
         // ....
	requeueableFns               map[string]api.VoteFn
	jobRequeueFns                map[string]api.JobRequeueFn
         // ....
}

type JobRequeueFn func(*JobInfo) error
type VoteFn func(interface{}) int

Additional context

No response

@flyingfang flyingfang added the kind/feature Categorizes issue or PR as related to a new feature. label Sep 11, 2024
@Monokaix
Copy link
Member

Can you also paste the related issues?

@Monokaix
Copy link
Member

It's a good idea, but what I concern is that add a new action is a little heavy and it's a little cut off from enqueue action.
How about just use jobReady or other callback func to rollback the pg status?

@flyingfang
Copy link
Contributor Author

It's a good idea, but what I concern is that add a new action is a little heavy and it's a little cut off from enqueue action. How about just use jobReady or other callback func to rollback the pg status?

I think "requeue" and "enqueue" are two decoupled logical operations, so I didn’t consider placing them within the same action. From the current perspective, besides increasing the overhead of invoking queueOrder and jobOrder, there do not seem to be any additional costs involved. In light of this, would keeping these as separate actions provide higher flexibility?

@Monokaix
Copy link
Member

It doesn’t have to be placed in the enqueue action. The func UpdateJobStatus will update pg when a scheduling ends, can we just check the pg in inqueue status to rollback to pending if it's unschedulable instead of in a seperate action, becasue the ssn can hold the unschedulable message and we can use them to check whether change the pg status.

@flyingfang
Copy link
Contributor Author

It doesn’t have to be placed in the enqueue action. The func UpdateJobStatus will update pg when a scheduling ends, can we just check the pg in inqueue status to rollback to pending if it's unschedulable instead of in a seperate action, becasue the ssn can hold the unschedulable message and we can use them to check whether change the pg status.

Keeping jobs in Inqueue status still makes sense in general scenarios. Because the jobs have already acquired their quotas and will be scheduled util the required resources are available.
What we are proposing is merely to open up a mechanism for the logic of automatically re-enqueuing jobs following specific strategies, while the default behavior remains to retain the jobs in the Inqueue status.

@Monokaix
Copy link
Member

It doesn’t have to be placed in the enqueue action. The func UpdateJobStatus will update pg when a scheduling ends, can we just check the pg in inqueue status to rollback to pending if it's unschedulable instead of in a seperate action, becasue the ssn can hold the unschedulable message and we can use them to check whether change the pg status.

Keeping jobs in Inqueue status still makes sense in general scenarios. Because the jobs have already acquired their quotas and will be scheduled util the required resources are available. What we are proposing is merely to open up a mechanism for the logic of automatically re-enqueuing jobs following specific strategies, while the default behavior remains to retain the jobs in the Inqueue status.

So what's the difference between re-enqueuing jobs following specific strategies logs and re-enqueue when job schedulable?

@Monokaix
Copy link
Member

How about discussing in weekly meeting?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

2 participants