Numbers and algos

Numbers

L1 cache reference	0.5 ns
Branch mispredict	5 ns
L2 cache reference	7 ns
Mutex lock/unlock	25 ns
Main memory reference	100 ns
Reading first byte from freshly mmaped page	1-2 us
Compress 1K bytes with Zippy	3 us
`socket()` (1)	5 to 25 us
router packet forwarding	10us for 1Gbps, 20us for 10/100Mbps
One-way latency through Gbps link	20 us
Fork, then read a msg sent (piped) from the forked son	100 us
Context switch when little L2 cache to refill (3)	10-100 us
Read 1 MB sequentially from memory	250 us
Round trip within same datacenter	500 us
Context switch when full L2 cache to refill (3)	0.5 - 1 ms (max)
Disk seek	10 ms
Read 1 MB sequentially from disk	20 ms
Send packet across the US (2)	70 ms
Send packet CA->Netherlands->CA	150 ms

(1) From a Pentium III with 900Mhz in 2003, so most likely faster in 2012. Judging from the graph, the latency looks nearly constant with the number of opened sockets for Linux and BSD.

The POSIX API says that the kernel always has to use the lowest unused file descriptor when creating a new one. There is no known way to do this in O(1). Linux uses bit vectors. For Linux and BSD, this scales well

(2) From http://www.facebook.com/note.php?note_id=23844338919

(3) context switching overhead of refilling the L2 cache increases with the size of cached data (until L2 is full and the overhead of cache misses becomes the main performance issue) www.cs.rochester.edu/u/cli/research/switch.pdf

Misc

A typical workload is having two dozen processes, with one or two of them actually having something to do (they are runnable).

Linux scheduler: 100Hz

Use vmstat to get the number of context switches per second. MP3 player switches context <50 times per sec. http server can switch context up to 12k times per sec.

Creating a thread on Windows or Solaris can be slow (ie it's not a good idea to create them on the fly). hence, use thread pools, created at program start. Problem: more requests than threads in the pool = some requests have to wait.

Hard disks are fast when reading sequentially, but slow when they have to move the head (i.e. seek). When unfragmented, hard drive reads data at 12MBps. When fragmented, less than 4Mbps. when competing download (from another disk or from loopback), read at 4MBps too.

Algos

Token bucket, Naggle, leaky bucket, etc. http://en.wikipedia.org/wiki/Category:Networking_algorithms

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Numbers and algos

Numbers

Misc

Algos

Clone this wiki locally