-
Notifications
You must be signed in to change notification settings - Fork 96
Chord FAQ
- Basics
- Features and functionality
- How large of a system can Chord/DHash scale to?
- Does Chord/DHash protect against malicious users?
- Does Chord/CFS support keyword search?
- Does Chord/CFS provide anonymous file access (a la Freenet)
- Is there a read/write version of CFS?
- Since CFS breaks files into blocks won't it lose at least one block of a big file?
- Developing and using the Chord implementation
Chord is a peer-to-peer lookup algorithm. It allows a distributed set of participants to agree on a single node as a rendezvous point for a given key, without any central coordination. In particular, it provides a distributed evaluation of the successor(ID) function: given the identifier of a key ID, the successor function returns the address of the node whose identifier most closely follows ID in a circular identifier space. The identifier space is typically a 160-bit number. The Chord algorithm handles adjusting this mapping as the population of nodes changes over time.
More details are described in publications found at our publications page. Chord has been used to build a block storage infrastructure, naming services and various file sharing systems.
Chord is sometimes referred as a distributed hash table; however, the Chord algorithm itself does not specify any mechanism for storage of data. That is the role of DHash.
DHash is also sometimes referred to as a distributed hash table. It is a layer built built on top of Chord and handles reliable storage of data blocks on participating nodes. It does this through techniques such as replication and erasure coding. The logical application interface is simply:
key = put (data)
data = get (key)
Data stored in the system is immutable and identified by its contents; freeing DHash from having to worry about semantics of multiple writes. DHash has been used to build a backup system, various file systems (CFS and Ivy), and a Usenet News server. For details on how to write programs to use DHash, see our hacking notes.
We have a single research implementation, for Linux and BSD systems. It is designed to help us experiment with different protocol settings to find out how best to build distributed hash tables. It is implemented in C++ using the SFS libraries.
Re-implementations of Chord in other languages include:
- Macedon (C++).
- i3/Chord (C, appears to support Windows).
- P2 (A custom declarative language).
- The Circle (Python).
- Open Chord (Java).
- Overlay Weaver (Java).
- i3 (Java, J2EE).
We have no experience using these implementations and you are encouraged to contact the individual developers if you have questions.
The following research papers are known to use the MIT Chord implementation:
- Our own research papers.
- Total Recall: Systems Support for Automated Availability Management. R. Bhagwan, K. Tati, Y. Cheng, S. Savage, G. M. Voelker. Proceedings of the First ACM/USENIX Symposium on Networked Systems Design and Implementation, March 2004.
- Handling Churn in a DHT. Sean Rhea, Dennis Geels, Timothy Roscoe, and John Kubiatowicz. Proceedings of the USENIX Annual Technical Conference, June 2004.
- Evaluating DHT Implementations in Complex Environments by Network Emulator. Daishi Kato, Toshiyuki Kamiya. Proceedings of the 6th International Workshop on Peer-to-Peer Systems, February 2007.
In theory, the protocols themselves scale logarithmically with the number of nodes.
However, there aren't any wide-area p2p systems currently (2005) that scale much beyond several million simultaneous users. Our implementation has never been tested with more than hundreds of participating nodes and millions of data items.
No and sort-of. Security in distributed rendezvous protocols like Chord is still an open research question (2005), though some early results are discussed in the Proceedings of the first IPTPS.
DHash provides integrity protection of data by restricting IDs used to be the output of a cryptographic hash of the data or a public-key signature. However, it does not protect against denial of service attacks where malicious nodes interfere with routing.
CFS does not support keyword search. Our investigations into keyword search have suggested that simple solutions result in poor load-balance: for example, naively storing an index of all items which contain a keyword K at the successor of the hash of K. For more information, see this paper.
No. CFS's design focuses on scalability and load balance for popular, public data. CFS does not protect the identity of the publisher or reader of data.
We have built a system called Ivy that is a multi-user read-write filesystem.
I noticed CFS divides files into small blocks. Doesn't this mean that as my file grows in size the probability that all of the blocks are present tends towards zero?
The main claim in the Chord system is that "every block survives with high probability". At first glance, this seems questionable: we prove that any particular block is lost with probability 1/N^2 (where N is the number of nodes in the system). But if there are much more than N^2 blocks, it seems that the odds of losing a block must be quite large. To resolve this problem, we need to look at how chord stores blocks. Chord picks a single "primary node" for a block, and stores the block at that node and the log N nodes immediately following it on the ring. The key observation is that no matter how many blocks there are, there are only N distinct "primary nodes" and, therefore, N distinct sets of 2 log N contiguous nodes on which a block can be stored. As long as one node in each of these contiguous groups stays alive, there will be it least one live copy of each particular block. Under a node failure probability of 1/2, the probability that all the nodes in a sequence of 2 log N of them fail is only 1/N^2, so the probability that any of the N contiguous groups loses all of its members is only 1/N.
The Chord/DHash implementation is licensed under an MIT/X11 license. This license is compatible with the GPL license used by the underlying SFSlite libraries.
The top reasons for Chord not building:
- you are using an old SFS snapshot. Chord will not work with SFS-0.7.2.
Your error message will probably have
something to do with
push_back
being undeclared. Use SFSlite 0.8.17; a recent CVS checkout of SFS from fs.net will also work. Chord is not yet compatible with the SFSlite 1.x series. - you are using a broken compiler (or our C++ is not compliant). gcc 3.3 or later is probably a good idea.
- The latest check-in is broken. We try to minimize this, but Chord is under development.
If you do have a problem, please first peruse the (recent) archives of the
chord
mailing list at
https://pdos.csail.mit.edu/pipermail/chord/
to see if someone else has experienced a similar problem.
If you want to search for something, you can restrict Google
with site:amsterdam.lcs.mit.edu
which seems to work fairly well.
If you don't find any solution, please e-mail the chord mailing list with the following information:
-
Your OS (e.g. FreeBSD 5.3, Fedora Core 3, etc.)
-
The version of the various (relevant) tools you are using.
For example, if there's a compilation error, include your
gcc
version and versions of relevant libraries (e.g. SFS, gtk, BerkeleyDB). Include where you may have gotten the libraries or if you installed them yourself. If you are using anonymous CVS and can't generate theconfigure
files, indicate what version of autoconf and automake you are using. -
The specific error message you are seeing, with some lines of context.
Note that the chord mailing list is moderated to reduce spam so if you are not subscribed, your message may see a variable delay (usually less than 1 day) before it is actually posted to the list.
Chord communicates with peers using standard RPCs. The specific protocols are defined using XDR and can be viewed at https://github.com/sit/dht/tree/master/svc. However, these definitions do not specify the semantics in detail --- if you want to implement Chord, you may be better off doing it without seeking compatibility with our implementation.
Chord and DHash use a custom-built transport layer optimized for peer-to-peer communication patterns; it uses the Vivaldi network coordinates system to predict round-trip times to remote hosts and windows outstanding RPCs. This is because TCP turns out to not be a particularly good transport for peer-to-peer DHT communication patterns: TCP relies on communication between a single pair of nodes to its RTT estimate and to find the proper window size. Peer-to-peer nodes tend to communicate with many nodes briefly, making TCP set-up expensive but not long enough to get out of slow-start or measure the RTT well. If long-standing TCP connections were left open to hundreds of nodes, the kernel would run out of buffers.
Our transport layer is implemented on top of the SFS asynchronous RPC libraries over UDP.
There are two basic options for embedding something like Chord in your application (see this paper for details):
- Link in the code for doing Chord routing/DHash storage.
- Talk to a separate process.
If you want the former, you can take a look at our source code
and look at how lsd
(location service daemon) works.
If you want the latter, you can look at the the .x
files in the
svc/
directory and use any Sun RPC library to talk directly to the
lsd
server. We have an example client written in Python that
talks to lsd
in the devel/
directory of our source tree (e.g.
dbm.py).
No. Chord servers listen for RPCs and must be able to receive incoming connections. In addition, the identifier of a node is based on its IP address which is presumed to be unique.
lsd
requires resources --- UDP/TCP ports, on-disk database files,
and on-disk control sockets --- that can not be shared between
multiple instances of lsd
. If you try and run multiple lsd
processes without changing the default locations of these resources,
you will get various errors, crashes, or unpredictable behavior,
such as:
- Segmentation violation due to shared database files.
- Unable to bind TCP or UDP port
- Inability to select desired
lsd
when using local applications such asdbm
orlsdctl
In order to avoid this, use the start-dhash
script as documented
in the howto.