-
Notifications
You must be signed in to change notification settings - Fork 38
NDP overview
This page provides a brief overview of the NDP protocol, a protocol aimed at offering both low latency and high throughput in Clos datacenter networks. A visual explanation of NDP can be found in this video clip, while a detailed description is available in this paper published in Sigcomm 2017 (best paper award recipient).
The key idea behind NDP is to have switches trim packets on overload instead of dropping them; the resulting headers are queued in a separate, per-port header queue. The switch sends packets from the header and data packet queues per port using a weighted round-robin approach (10 headers for 1 data packet).
NDP switch buffers are really small: both the data and header buffers are 8 packets long (with 1.5KB packets this means 12KB). Small buffers ensure low latency for short transfers, yet NDP's protocol design also ensures near-optimal throughput for long flows at the same time.
The NDP transport protocol builds on the packet trimming capability of switches. The key assumption of the transport is that it runs in a multipath network topology where the core is sufficiently provisioned to avoid persistent overload, and where the different paths between any two servers are of equal length. All Clos topologies used in practice meet these assumptions. NDP operation can be summarized as follows:
- Senders send an initial window of data at line rate, then wait for PULL packets from receivers before sending more data.
- Packets are sent on all available paths in a per sender round-robin fashion (switch level packet-level load balancing also works but requires slightly larger switch buffers to ensure full utilization - 10 packets)
- Switches may trim packets.
- The receiver will generate an ACK for each data packet received, and a NACK for each header; these are sent immediately to the sender. For every incoming packet/header, the receiver also adds a PULL message to its UNIQUE pull-queue (shared by all incoming connections).
- The receiver sends out PULL packets to their respective senders, at a paced rate. The pacing is done to ensure that the resulting data packets arrive at line rate. For instance, with 1.5KB MTU, pull packets are sent every 1.2us on a 10Gbps link.
The image above captures the interaction between the different mechanisms used by NDP.
To use NDP and TCP traffic simultaneously, the best approach is to split TCP and NDP in different traffic classes and use fair queueing on the switches between these two classes. TCP traffic requires larger buffers to achieve higher throughput; the fair-queueing approach ensures that NDP can still use small packet buffers and achieve both low latency and high throughput. In the worst case, the latency experienced by a short NDP flow on a network where it coexists with TCP is double that of a network where NDP runs alone.