-
Notifications
You must be signed in to change notification settings - Fork 3
Numbers and algos
- http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf
- http://bulk.fefe.de/scalable-networking.pdf
L1 cache reference | 0.5 ns |
Branch mispredict | 5 ns |
L2 cache reference | 7 ns |
Mutex lock/unlock | 25 ns |
Main memory reference | 100 ns |
Reading first byte from freshly mmaped page | 1-2 us |
Compress 1K bytes with Zippy | 3 us |
socket() (1)
|
5 to 25 us |
router packet forwarding | 10us for 1Gbps, 20us for 10/100Mbps |
One-way latency through Gbps link | 20 us |
Fork, then read a msg sent (piped) from the forked son | 100 us |
Context switch when little L2 cache to refill (3) | 10-100 us |
Read 1 MB sequentially from memory | 250 us |
Round trip within same datacenter | 500 us |
Context switch when full L2 cache to refill (3) | 0.5 - 1 ms (max) |
Disk seek | 10 ms |
Read 1 MB sequentially from disk | 20 ms |
Send packet across the US (2) | 70 ms |
Send packet CA->Netherlands->CA | 150 ms |
(1) From a Pentium III with 900Mhz in 2003, so most likely faster in 2012. Judging from the graph, the latency looks nearly constant with the number of opened sockets for Linux and BSD.
The POSIX API says that the kernel always has to use the lowest unused file descriptor when creating a new one. There is no known way to do this in O(1). Linux uses bit vectors. For Linux and BSD, this scales well
(2) From http://www.facebook.com/note.php?note_id=23844338919
(3) context switching overhead of refilling the L2 cache increases with the size of cached data (until L2 is full and the overhead of cache misses becomes the main performance issue) www.cs.rochester.edu/u/cli/research/switch.pdf
A typical workload is having two dozen processes, with one or two of them actually having something to do (they are runnable).
Linux scheduler: 100Hz
Use vmstat
to get the number of context switches per second. MP3 player switches context <50 times per sec. http server can switch context up to 12k times per sec.
Creating a thread on Windows or Solaris can be slow (ie it's not a good idea to create them on the fly). hence, use thread pools, created at program start. Problem: more requests than threads in the pool = some requests have to wait.
Hard disks are fast when reading sequentially, but slow when they have to move the head (i.e. seek). When unfragmented, hard drive reads data at 12MBps. When fragmented, less than 4Mbps. when competing download (from another disk or from loopback), read at 4MBps too.
Token bucket, Naggle, leaky bucket, etc. http://en.wikipedia.org/wiki/Category:Networking_algorithms