-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add proposal for Locality LoadBalance #574
base: main
Are you sure you want to change the base?
Conversation
Welcome @derekwin! It looks like this is your first PR to kmesh-net/kmesh 🎉 |
Would you like to share your issue at Thursday's community meeting? |
|
||
### Motivation | ||
|
||
Currently, kmesh does not support locality topology-aware load balancing. Locality Load Balancing optimizes performance and reliability in distributed systems by directing traffic to the nearest service instances. This reduces latency, enhances availability, and lowers costs associated with cross-region data transfers. It also helps ensure compliance with data sovereignty regulations and improves overall user experience by providing faster and more reliable service responses. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, kmesh does not support locality topology-aware load balancing. Locality Load Balancing optimizes performance and reliability in distributed systems by directing traffic to the nearest service instances. This reduces latency, enhances availability, and lowers costs associated with cross-region data transfers. It also helps ensure compliance with data sovereignty regulations and improves overall user experience by providing faster and more reliable service responses. | |
Currently, Kmesh does not support locality topology-aware load balancing. Locality Load Balancing optimizes performance and reliability in distributed systems by directing traffic to the nearest service instances. This reduces latency, enhances availability, and lowers costs associated with cross-region data transfers. It also helps ensure compliance with data sovereignty regulations and improves overall user experience by providing faster and more reliable service responses. |
Unified capitalisation of initial letters in Kmesh
|
||
#### case 1. locality failover | ||
1. Destination Rule | ||
Same as Istion. Parse rules specify configuration for Locality load balancing. (todo: outlier detection settings to detect and evict unhealthy hosts from the load balancing pool.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is istion? Istio?
/ok-to-test |
Codecov ReportAll modified and coverable lines are covered by tests ✅
see 29 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
|
||
Currently, kmesh does not support locality topology-aware load balancing. Locality Load Balancing optimizes performance and reliability in distributed systems by directing traffic to the nearest service instances. This reduces latency, enhances availability, and lowers costs associated with cross-region data transfers. It also helps ensure compliance with data sovereignty regulations and improves overall user experience by providing faster and more reliable service responses. | ||
|
||
#### Goals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#### Goals | |
### Goals |
|
||
1. prioritize add locality load balancing capabilities in the workload mode. | ||
|
||
2. two types of locality load balancing : locality failover, locality weighted distribution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure how locality weighted distribution can be implemented in workload mode. The workload api does not support weight actually
#### case 1. locality failover | ||
1. Destination Rule | ||
Same as Istion. Parse rules specify configuration for Locality load balancing. (todo: outlier detection settings to detect and evict unhealthy hosts from the load balancing pool.) | ||
- Outlier detection should occur before load balancing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not suite worklaod mode as workload api does not include outlier setting. It do LB based on where the endpoint resides.
yes |
I have updated the proposal. |
Propose a new implementation for a location matching algorithm that avoids circular computations while also reducing the amount of data needed to be stored in BPF maps. detail: https://github.com/derekwin/treemap/tree/master |
if no conflict, there is no need to merge main branch.
then fix some conflicts, and then
the DCO github action failed, because it asks you to commit with your signature, which can be attached with
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish to see more api design instead of function implement in the proposal
How do you express the priority level, and how do you match the client locality with the endpoints
|
||
1. prioritize add locality load balancing capabilities in the workload mode. | ||
|
||
2. locality load balancing mode: locality failover. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about strict mode
``` | ||
https://pkg.go.dev/istio.io/istio/pkg/workloadapi#LoadBalancing_Scope | ||
|
||
2. calculate locality match rank |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
group endpoints with prority
|
||
3. choose endpoint | ||
|
||
Randomly select one endpoint from the group with the highest rank as the service backend. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Randomly select one endpoint from the group with the highest rank as the service backend. | |
Randomly select one endpoint from the group with the highest priority |
And add more comments what we do if all the endpoints of high priority is unhealthy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And for the strict mode, how would you select the endpoint, i would like to see that
|
||
4. maybe more? Panic threshold | ||
|
||
When the proportion of healthy endpoints in the high-rank group falls below the panic threshold, select endpoints from the next rank group. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I donot care about this at first. First respect workload healthy status
__u32 waypoint_addr; | ||
__u32 waypoint_port; | ||
// 增加健康状态 healthStatus | ||
// 增加locality信息 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please supplement what does this field look like.
Add corresponding fields to the `pkg/controller/workload/bpfcache/service.go`, and update logic to `pkg/controller/workload/workload_processor.go` | ||
|
||
2. Configure the locality (region, zone, subzone) and health status (HEALTHY, UNHEALTHY)of the backend. This corresponds to the message in workload.proto. | ||
> Although the current workload API defines seven scopes, when configuring a pod's locality, only region, zone, and subzone are configured. Therefore, matching capabilities can only be realized for these three scopes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only region, zone, and subzone are configured
where do you get this conclusion, at least NODE is supported now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I misunderstood it before. I saw that in ztunnel, the NODE, NETWORK, and CLUSTER information were maintained within the workload, and I considering adding these informations to the bpf map of the backend later.
__u32 service[MAX_SERVICE_COUNT]; | ||
struct ip_addr wp_addr; | ||
__u32 waypoint_port; | ||
__u8 health_status; // workload_health_status_t: HEALTHY, UNHEALTHY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we filtered out unhealthy workload
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, only healthy workloads are stored in bpf map by the control plane. Does locality load balance not need to concern whether the workload is healthy or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can make it simpler, even the priority set can be calculated in user space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Priority computation between localities occurs when a new flow is came. If priority calculation takes place at the control plane, my understanding is that we would need to precompute all possible scenarios(We are unable to perform event-driven programming that interoperates with user space, right?), then hash different situations and store them in a BPF map. The kernel space would then query the map using source and destination locality information to obtain priority information. To simplify the problem, we could arrange combinations based on the specific values pointed to by the six routing options in scope (also including cases where only some of these match). This approach has two potential issues:
Firstly, userspace must enumerate all possible scenarios, which becomes particularly burdensome as the richness of locality information increases, leading to an exponential growth in the number of situations to be stored. Secondly, the BPF map would have to store all aforementioned scenarios, with each scenario existing in the form of the prio_map as currently designed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern mainly on : 1. ebpf instruction limit 2. dataplane sorting performance Worth a try though
struct ip_addr wp_addr; | ||
__u32 waypoint_port; | ||
__u8 health_status; // workload_health_status_t: HEALTHY, UNHEALTHY | ||
locality_t locality; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is locality_t then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add it in next commit.
new proposal of locality LB in user-space logic |
typedef struct { | ||
__u32 service_id; // service id | ||
__u32 rank; // rank | ||
} prio_key; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the relationship with endpoint_map?
When we use this map, and when we use the other?
} prio_key; | ||
typedef struct { | ||
__u32 count; // count of current prio | ||
__u32 uid_list[MAP_SIZE_OF_PRIO]; // workload_uid to backend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can waste memory
So why not add priority to endpoint key
update endpoint_key {
typedef struct {
__u32 service_id; // service id
__u32 Priority,
__u32 backend_index; // if endpoint_count = 3, then backend_index = 0/1/2
} endpoint_key;
New design has been updated to the proposal and the correspond code Pr is here #900 |
dbf6e1d
to
3623f1e
Compare
``` | ||
typedef struct { | ||
__u32 service_id; // service id | ||
__u32 prio; // prio means rank, 6 means match all, and 0 means match nothing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By adding this, how do we select a endpoint now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For random lb mode,workload will only be added to endpoint with maxprio(6).
For locality lb mode,workload will be added to endpoint with rank that calculated by matching kmesh processor's locality info with workload's locality info.
We also record the count number of endpoints belongs to which prio in serviceValue, so that we can use it as it before.
In the bpf prog, If service is in random lb mode, we can search endpoint with maxprio. if it is in locality lb mode, we will iter prio from maxprio to 0, if count of that prio >0, which means there have one or more endpoints in that prio, we can choose one workload index by random int value with count, and get endpoint whith serviceId, prio and workloadIndex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense, not the bpf map update should be a little bit tricky
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0 means nothing
means what?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prio value is from 0 to 6, 0 means the lowest priolity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, i would suggest the opposite. Because we can search from the highest priority easily
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I have updated it.
|
||
workload.h | ||
``` | ||
#define MAX_PRIO 6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC the max priority rank can be 7
with
// Prefer traffic in the same region.
LoadBalancing_REGION LoadBalancing_Scope = 1
// Prefer traffic in the same zone.
LoadBalancing_ZONE LoadBalancing_Scope = 2
// Prefer traffic in the same subzone.
LoadBalancing_SUBZONE LoadBalancing_Scope = 3
// Prefer traffic on the same node.
LoadBalancing_NODE LoadBalancing_Scope = 4
// Prefer traffic in the same cluster.
LoadBalancing_CLUSTER LoadBalancing_Scope = 5
// Prefer traffic in the same network.
LoadBalancing_NETWORK LoadBalancing_Scope = 6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prio value is from 0 to 6, so i set MAX_PRIO as 6, which actually is the 7th rank.
Signed-off-by: seclee <[email protected]> Signed-off-by: derekwin <[email protected]>
Signed-off-by: seclee <[email protected]> Signed-off-by: derekwin <[email protected]>
Signed-off-by: derekwin <[email protected]>
Signed-off-by: derekwin <[email protected]>
Signed-off-by: derekwin <[email protected]>
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: hzxuzhonghu The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind enhancement
What this PR does / why we need it:
add proposal for Locality LB