Skip to content

Numbers and algos

gentimouton edited this page Jan 9, 2013 · 1 revision

Numbers

L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns
Mutex lock/unlock 25 ns
Main memory reference 100 ns
Reading first byte from freshly mmaped page 1-2 us
Compress 1K bytes with Zippy 3 us
socket() (1) 5 to 25 us
router packet forwarding 10us for 1Gbps, 20us for 10/100Mbps
One-way latency through Gbps link 20 us
Fork, then read a msg sent (piped) from the forked son 100 us
Context switch when little L2 cache to refill (3) 10-100 us
Read 1 MB sequentially from memory 250 us
Round trip within same datacenter 500 us
Context switch when full L2 cache to refill (3) 0.5 - 1 ms (max)
Disk seek 10 ms
Read 1 MB sequentially from disk 20 ms
Send packet across the US (2) 70 ms
Send packet CA->Netherlands->CA 150 ms

(1) From a Pentium III with 900Mhz in 2003, so most likely faster in 2012. Judging from the graph, the latency looks nearly constant with the number of opened sockets for Linux and BSD.

The POSIX API says that the kernel always has to use the lowest unused file descriptor when creating a new one. There is no known way to do this in O(1). Linux uses bit vectors. For Linux and BSD, this scales well

(2) From http://www.facebook.com/note.php?note_id=23844338919

(3) context switching overhead of refilling the L2 cache increases with the size of cached data (until L2 is full and the overhead of cache misses becomes the main performance issue) www.cs.rochester.edu/u/cli/research/switch.pdf

Misc

A typical workload is having two dozen processes, with one or two of them actually having something to do (they are runnable).

Linux scheduler: 100Hz

Use vmstat to get the number of context switches per second. MP3 player switches context <50 times per sec. http server can switch context up to 12k times per sec.

Creating a thread on Windows or Solaris can be slow (ie it's not a good idea to create them on the fly). hence, use thread pools, created at program start. Problem: more requests than threads in the pool = some requests have to wait.

Hard disks are fast when reading sequentially, but slow when they have to move the head (i.e. seek). When unfragmented, hard drive reads data at 12MBps. When fragmented, less than 4Mbps. when competing download (from another disk or from loopback), read at 4MBps too.

Algos

Token bucket, Naggle, leaky bucket, etc. http://en.wikipedia.org/wiki/Category:Networking_algorithms

Clone this wiki locally