NDP Simulator

To get started with the simulator code, first clone the NDP repository. The simulator code is in the sim/ directory. The code has mostly been developed on MacOSX, but should compile fine on Linux/Windows too.

Building the simulator

Running make in the sim directory will compile the TCP, NDP, htsim and network source files. The code here is not datacenter specific. The output of this phase should be libhtsim.a which is needed to build the datacenter code (the library is statically linked in all the htsim_ executables in datacenter/).

To build the output parser tool (parse_output) run make parse_output. This will create an executable with the same name.

Next, go to the datacenter folder and run make. This should compile all the datacenter topologies that htsim supports as well as create many executables (called htsim_...) that are meant to run different exeperiments. An htsim_X executable results from a corresponding main_X source file that sets up and runs the experiment.

Running your first experiment

Now that we've compiled the simulator, let's get started. Type the following command:

./htsim_ndp_permutation -strat perm -nodes 16 -conns 16 -cwnd 30

This will create a FatTree topology containing 16 servers (k=4) where all links run at 10Gbps. The experiment will run a permutation traffic matrix containing 16 connections, where each server sends and receives exactly one long running NDP connection. The initial window used by the NDP senders is set to 30. Finally, packets are spread across the available paths using the perm strategy: each source will send packets on a random permutation of paths in a round-robin manner. After all paths have been visited, the permutation is regenerated. This ensures that there is no long-lived congestion at any of the ports in the core of the network.

By default, all experiments will output results in a file called logout.dat. To parse these results, run the following command:

../parse_output logout.dat -ndp -show

The output should look something like this:

9857.16 Mbps val 354 name ndp_sink_15_13(0)
9856.08 Mbps val 350 name ndp_sink_14_1(0)
9855.36 Mbps val 346 name ndp_sink_13_9(0)
9856.80 Mbps val 342 name ndp_sink_12_14(0)
9855.36 Mbps val 338 name ndp_sink_11_3(0)
9855.00 Mbps val 334 name ndp_sink_10_7(0)
9855.72 Mbps val 330 name ndp_sink_9_15(0)
9856.80 Mbps val 326 name ndp_sink_8_10(0)
9855.72 Mbps val 322 name ndp_sink_7_2(0)
9857.52 Mbps val 318 name ndp_sink_6_5(0)
9859.32 Mbps val 314 name ndp_sink_5_4(0)
9855.72 Mbps val 310 name ndp_sink_4_8(0)
9855.36 Mbps val 306 name ndp_sink_3_6(0)
9855.72 Mbps val 302 name ndp_sink_2_12(0)
9859.32 Mbps val 298 name ndp_sink_1_0(0)
9856.08 Mbps val 294 name ndp_sink_0_11(0)
Mean of lower 10pc (1 entries) is 1231875000.000000 total mean 1232055000.000000  mean2 0.000000

Each line gives the throughput of one connection, followed by details about the connection including its id in the trace file and its name. The name also includes the sender and receiver of the connection. The last line in the output gives the mean of the lower 10 percent of the flows in bytes per second, as well as the total mean, also in bytes per second.

You can play with the simulator by increasing the topology size (we ran upto 8192 nodes), changing the initial window, adding more or less connections (having the number of connections equal the number of servers is the worst case from an utilization point of view). You can also play with the routing strategy (rand emulated per-packet ECMP).

Running other experiments

To run incast or other experiments, the workflow is roughly the same. Run the appropriate htsim_* executable and provide the required parameters. You will however have to figure out exactly how to get the interesting outputs, as parse_output only shows average throughputs which in some cases are not very useful (e.g. for incast).

The EXAMPLES directory of the simulator contains a number of prepackaged experiments, including those that generate a number of the figures from the NDP Sigcomm'17 paper.

Customizing experiments

To understand how you can customize existing experiments or create new ones, let us slowly go through one of the experiment files, namely main_ndp_permutation.cpp. We will only highlight the more interesting aspects in the file.

A single topology header is included for the topology that is used in the experiment. In most experiments this will be the full bisection Fat Tree topology (Sigcomm 2008, Fares et al). A number of defines specify the default parameters used; some of these may be overriden by command line parameters. There are defines for the number of nodes, the default per-port queue size in packets, etc.

eventlist.setEndtime(timeFromSec(0.201));

The eventlist is the class that drives the htsim simulator; it is a heap of events, each with its own time. We set the total simulation time with the setEndTime() call. For NDP experiments, this time can be really small (e.g. 0.2 seconds) because NDP converges instantly; for TCP, however, we need to run experiments for at least one second, and ensure ssthresh is set correctly to ensure fast convergence.

We then start the logfile, set it to record events from simulator time zero.

logfile.setStartTime(timeFromSec(0));

The NdpSinkLoggerSampling object will iterate through all NdpSinks and log their rate every 10ms. This allows us to get throughput measurements after the experiment finishes.

NdpSinkLoggerSampling sinkLogger = NdpSinkLoggerSampling(timeFromMs(10), eventlist);

The following line creates the FatTree topology:

FatTreeTopology* top = new FatTreeTopology(no_of_nodes, queuesize,&logfile, &eventlist,ff,COMPOSITE,0);

The COMPOSITE parameter specifies the type of queue to be used. For NDP, we use the composite queue which implements packet trimming (source code in ../compositequeue.cpp).

Next, we create a connection matrix and generate a permutation:

ConnectionMatrix* conns = new ConnectionMatrix(no_of_nodes);
conns->setPermutation(no_of_conns);

By using different methods in the ConnectionMatrix class you can create different traffic patterns including random (setRandom), many-to-one, all-to-all, etc.

Next, we create the actual connections, iterating through the connections generated by the traffic matrix one by one. The code to create and start an NDP connection is given below:

ndpSrc = new NdpSrc(NULL, NULL, eventlist);
...
ndpSnk = new NdpSink(eventlist, 1 /*pull at line rate*/);
...
routeout = new Route(*(net_paths[src][dest]->at(choice)));
routeout->push_back(ndpSnk);
routein = new Route(*top->get_paths(dest,src)->at(choice));
routein->push_back(ndpSrc);
ndpSrc->connect(*routeout, *routein, *ndpSnk, timeFromMs(extrastarttime));

The connect call requires the outgoing and return routes (these are chosen randomly from the available ones), as well as the start time. Normally this is either 0 or a small random value close to zero (to avoid phase effects). Finally, depending on the route strategy the NDP endpoints may need to know all available paths:

                ndpSrc->set_paths(net_paths[src][dest]);
                ndpSnk->set_paths(net_paths[dest][src]);

After all connections are setup, the main event loop of the simulator is run:

while (eventlist.doNextEvent()) {
}

This will complete when there are no pending events or when the simulator time has elapsed.

TCP, DCTCP or MPTCP

The htsim code contains implementations of TCP NewReno (not SACK) and a version of the MPTCP protocol (including various congestion control algorithms). DCTCP, PFC/DCQCN are also supported (DCQCN is however based on the DCTCP code and is window based rather than rate based). To run experiments with these you can use the appropriate experiment files, or you can create your own ones.

htsim should be easy to understand and hack, so feel free to get your hands dirty.

Provide feedback

Saved searches