better RTT and lower cloud costs by preferencing lower latency members #15918

alam0rt · 2023-05-18T00:40:17Z

What would you like to be added?

Problem statement

etcd clients (such as kube-apiserver) will use round robin to select a member to connect to:

etcd/client/v3/internal/resolver/resolver.go

Lines 42 to 44 in 7e161d5

    
           // Build returns itself for Resolver, because it's both a builder and a resolver. 
        
           func (r *EtcdManualResolver) Build(target resolver.Target, cc resolver.ClientConn, opts resolver.BuildOptions) (resolver.Resolver, error) { 
        
           	r.serviceConfig = cc.ParseServiceConfig(`{"loadBalancingPolicy": "round_robin"}`)

This load balancing configuration has a downside in a HA cloud environment where cross zone traffic is metered as the apiserver will possibly connect to a member in another zone, which in turn replicates to members in other zones.

The other load balancing configurations are not suitable either, for example pick-first will serially connect to each member, and as the name implies, pick the first which connects. This would require ordering the endpoints which could be done if the order of the --etcd-servers in `kube-apiserver' order were retained (will need to test).

Why is this needed?

I believe a new load balancing configuration which prioritises members with lowest latency is a sensible default option for etcd.

This should see a decrease in RTT (manually configuring our test environment saw RTT from client -> etcd cluster reduce from 1.5ms in the worst case to about 550us)

This screenshot shows a graph of RTT between each zone as I trialed the pick-first configuration. You can see that the 9 lines (one for each relationship between zones) drops to 3 (as there are only 3 zones and traffic is no longer leaving the zone), plus the large reduction in RTT from the apiserver to etcd.

This should reduce cloud costs for etcd users
- consider 1tb of traffic sent from the apiservers in a 3 member setup (1 member and 1 client in each zone)
- in this environment, each gb of cross zone traffic costs $0.10
- round robin would see 66% percent of the traffic go cross zone and thus would incur a cost
- A latency aware lb configuration (presuming same zone == less latency 100% of the time) would save your users 66 dollars per day of operating costs.

Note this is just some rough napkin maths and I could be way off, regardless, I believe this feature would be beneficial.

The text was updated successfully, but these errors were encountered:

alam0rt · 2023-05-18T06:51:32Z

Alright, so I forked etcd and switched over the default client lb config to use pick-first (alam0rt@a9fd304)

This works, and the ordering specified in --etcd-servers is respected, but now the issue is that with a 5 member etcd cluster (each zone has 2, 2, 1 members), the pick-first obviously prioritises the first member, so the other member in the zone gets a bit too much love from the corresponding apiserver.

So yeah, I feel a new lb configuration is required and we can't rely on pick first + configuration changes to ordering of members.

serathius · 2023-05-18T07:17:38Z

I would prefer to delegate loadbalancing algorithm implementation to grpc as we are already planning to migrate to it #15145

alam0rt · 2023-05-18T23:03:47Z

Ah nice, are you referring to this? https://github.com/grpc/grpc/blob/master/doc/grpc_xds_features.md

serathius · 2023-05-19T07:56:24Z

Ah nice, are you referring to this? https://github.com/grpc/grpc/blob/master/doc/grpc_xds_features.md

No, I was just saying that in the best case etcd project should not implement loadbalancing, just provide a saine default and allow users to configure grpc themselves (pass flags via options).

stale · 2023-09-17T23:04:03Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

alam0rt added the type/feature label May 18, 2023

jmhbnz added area/performance area/clientv3 labels May 18, 2023

chaochn47 mentioned this issue Jun 7, 2023

Remove the dependency on grpc-go's experimental API (e.g. resolver & balancer) #15145

Open

stale bot added the stale label Sep 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

better RTT and lower cloud costs by preferencing lower latency members #15918

better RTT and lower cloud costs by preferencing lower latency members #15918

alam0rt commented May 18, 2023 •

edited

Loading

alam0rt commented May 18, 2023 •

edited

Loading

serathius commented May 18, 2023

alam0rt commented May 18, 2023

serathius commented May 19, 2023 •

edited

Loading

stale bot commented Sep 17, 2023

better RTT and lower cloud costs by preferencing lower latency members #15918

better RTT and lower cloud costs by preferencing lower latency members #15918

Comments

alam0rt commented May 18, 2023 • edited Loading

What would you like to be added?

Problem statement

Why is this needed?

alam0rt commented May 18, 2023 • edited Loading

serathius commented May 18, 2023

alam0rt commented May 18, 2023

serathius commented May 19, 2023 • edited Loading

stale bot commented Sep 17, 2023

alam0rt commented May 18, 2023 •

edited

Loading

alam0rt commented May 18, 2023 •

edited

Loading

serathius commented May 19, 2023 •

edited

Loading