RFC |
0003 |
---|---|
Title |
Resource allocation by auction |
Sponsor |
|
Status |
Draft 💬 |
Type |
Standards |
Created |
2019-12-13 |
Depends on |
RFC-0005 |
Fundamental to any task orchestration engine, such as Otto, is the allocation of resources for the execution of tasks. Matching tasks to resources is simple in small environments where one might have a single orchestrator and only a few resources, e.g. virtual machines, capable of executing tasks. More complex task workloads require increasingly complex approaches for efficient allocation and utilization of resources available to the task orchestration engine. This document describes the approach taken by Otto, wherein tasks are "auctioned" to resources or orchestrators which can maximize utilization (saturation of resource) while minimizing cost (time to execute task, operational expenditure of resource).
Auction-based Resource Allocation requires at minimum three system components in order to operate properly: the Eventbus, the Auctioneer, and an Orchestrator. The Auctioneer manages the bidding on tasks and ultimately is responsible for evaluating which bid has the lowest cost to execute the task. Resource cost is a coupling of the underlying operational cost of the resource, e.g. Compute/Hour, and the Orhcestrator’s estimated time to execute the task.
This specification will not describe the actual execution of the task, which may be carried out on an Agent, at the behest of an Orchestrator, or executed by the Orchestrator itself.
Tasks in the system are also not generated by the Auctioneer, but are instead expected to be published to the Eventbus by another service.
Task auctions are meant to be fast, lightweight, and not 100% perfect. The Auctioneer plays the role of operating a quick auction for each task that it receives. The Auctioneers uses a configured auction duration to determine how much time to allow each auction to exist.
ℹ️
|
The Auctioneer does not generate the tasks in the system, that responsibility is outside of the purview of this document. |
For example a task A
arrives in tasks.for_auction
. The Auctioneer processes
the task and creates its internal representation for the auction before
announcing the task auction on tasks.auction
. The various Orchestrators
consuming from tasks.auction
may then consider the contents of the
auction format in order to determine whether they can/should create a bid,
which they submit onto tasks.bids
.
The Auctioneer listens on tasks.bids
for bids on all open auctions, for the
example task A
, it would see zero or more bids for A
. Once the configured
auction duration elapses, the Auctioneer chooses the most cost-effective bid
and then writes a "auction won" message into the inbox (inbox.<clientId>
) for
the Orchestrator whose bid won.
The Auctioneer maintains a list of currently "open" auctions for tasks, and reports via a web interface on the status of these auctions.
🔥
|
Reliability concern: What happens if an Orchestrator wins a bid, but then is unable to actually start working on the task? Should it be cancelled? How would the Auctioneer handle this? |
The format of the message announcing the task auction is described as follows:
{
"task" : {
"raw" : [full task definition] (1)
"capabilities" : { (2)
}
},
"auction" : {
"starts" : "1970-01-01" (3)
"ends" : "1970-01-01" (4)
},
}
-
The full format of a task definition is not subject of this specification.
-
Key-value listing of task capabilities requested for execution of the task.
-
The ISO-8601 formatted timestamp of when the auction was opened
-
The ISO-8601 formatted timestamp of when the auction will close.
The implementation and specifics of the Eventbus are not described in this document. For our purposes however it is important to describe the channels which are required for the resource auction to operate:
Channel name | Stateful | Purpose |
---|---|---|
|
✓ |
Tasks which have not yet been auctioned, primarily used by the Auctioneer |
|
✓ |
Tasks which are available to be bid upon by Orchestrators. |
|
✓ |
Task bids by the various Orchestrators. |
|
✓ |
Channel representing the private inbox of a given client. This channel is where rewarded bids will be dispatched. |
|
x |
Informational channel for tasks which are being executed. |
|
x |
Informational channel for tasks which are finished executing. |
The role of "Orchestrator" in the auction process can be served by a service whose sole responsibility is to bid and provision agents, or it could be served by an Agent itself. Standalone Orchestrators might take the form of an "EC2 Orchestrator" which can dynamically provision resources in AWS EC2. An Agent-Orchestrator, an Agent which acts as an Orchestrator, in contrast would be a long-lived resource, like the proverbial build machine under somebody’s desk.
Both forms of Orchestrators are responsible for determining their
capabilities. These capabilities will help the Orchestrator determine
whether or not it should bid for a certain task which is up for auction. For
example, resources which are capable of running Docker containers would be able
ot bid on tasks which require containers. A resource which cannot provide
sudo
access or admin privileges would in contrast avoid bidding on tasks
which require escalated privileges for execution.
Both forms of Orchestrators should listen to the tasks.auction
channel in
additional to their "personal" inbox channel.
Capability | Values | Notes |
---|---|---|
|
|
Number of cores necessary to run the task |
|
|
Memory necessary to run the task |
|
|
The resource can run a Docker container. |
|
|
The resource has a |
💡
|
Explain why the existing code base or process is inadequate to address the problem that the RFC solves. This section may also contain any historal context such as how things were done before this proposal.
|
💡
|
Explain why particular design decisions were made. Describe alternate designs that were considered and related work, e.g. how the feature is supported in other systems. Provide evidence of consensus within the community and discuss important objections or concerns raised during discussion.
|
💡
|
Describe any incompatibilities and their severity. Describe how the RFC proposes to deal with these incompatibilities. If there are no backwards compatibility concerns, this section may simply say: There are no backwards compatibility concerns related to this proposal. |
💡
|
Describe the security impact of this proposal. Outline what was done to identify and evaluate security issues, discuss of potential security issues and how they are mitigated or prevented, and how the RFC interacts with existing permissions, authentication, authorization, etc. If this proposal will have no impact on security, this section may simply say: There are no security risks related to this proposal. |
💡
|
If the RFC involves any kind of behavioral change to code give a summary of how its correctness (and, if applicable, compatibility, security, etc.) can be tested. In the preferred case that automated tests can be developed to cover all significant changes, simply give a short summary of the nature of these tests. If some or all of changes will require human interaction to verify, explain why automated tests are considered impractical. Then summarize what kinds of test cases might be required: user scenarios with action steps and expected outcomes. Might behavior vary by platform (operating system, servlet container, web browser, etc.)? Are there foreseeable interactions between different permissible versions of components? Are any special tools, proprietary software, or online service accounts required to exercise a related code path (Active Directory server, GitHub login, etc.)? When will testing take place relative to merging code changes, and might retesting be required if other changes are made to this area in the future? If this proposal requires no testing, this section may simply say: There are no testing issues related to this proposal. |
💡
|
Link to any open source reference implementation of code changes for this proposal. The implementation need not be completed before the RFC is accepted but must be completed before the RFC is given "final" status. RFCs which will not include code changes may omit this section. |