chapter1.txt

.-  Instructions for z2w
.set GIT=https://github.com/imatix/zguide
.set BRANCH=master
.set EMAIL=zeromq-dev@lists.zeromq.org
.set LIST=http://lists.zeromq.org/mailman/listinfo/zeromq-dev

.output chapter1.wd
**By Pieter Hintjens <ph@imatix.com>, CEO iMatix Corporation.**

With thanks to Bill Desmarais, Brian Dorsey, CAF, Daniel Lin, Eric Desgranges, Gonzalo Diethelm, Guido Goldstein, Hunter Ford, Kamil Shakirov, Martin Sustrik, Mike Castleman, Naveen Chawla, Nicola Peduzzi, Oliver Smith, Olivier Chamoux, Peter Alexander, Pierre Rouleau, Randy Dryburgh, John Unwin, Alex Thomas, Mihail Minkov, Jeremy Avnet, Michael Compton, Kamil Kisiel, Mark Kharitonov, Guillaume Aubert, Ian Barber, Mike Sheridan, Faruk Akgul, Oleg Sidorov, Lev Givon, Allister MacLeod, Alexander D'Archangel, Andreas Hoelzlwimmer, Han Holl, Robert G. Jakabosky, Felipe Cruz, Marcus McCurdy, Mikhail Kulemin, Dr. Gergő Érdi, Pavel Zhukov, Alexander Else, Giovanni Ruggiero, Rick "Technoweenie", Daniel Lundin, Dave Hoover, and Zed Shaw for their contributions, and to Stathis Sideris for [http://www.ditaa.org Ditaa].

Please use the [$(GIT)/issues issue tracker] for all comments and errata. This version covers the latest stable release of 0MQ and was published on &date("ddd d mmmm, yyyy").

The Guide is mainly [/page:all in C], but also in [/php:all PHP] and [/lua:all Lua].

++ Chapter One - Basic Stuff

+++ Fixing the World

How to explain 0MQ? Some of us start by saying all the wonderful things it does. //It's sockets on steroids. It's like mailboxes with routing. It's fast!//  Others try to share their moment of enlightenment, that zap-pow-kaboom satori paradigm-shift moment when it all became obvious. //Things just become simpler. Complexity goes away. It opens the mind.//  Others try to explain by comparison. //It's smaller, simpler, but still looks familiar.//  Personally, I like to remember why we made 0MQ at all, because that's most likely where you, the reader, still are today.

Programming is a science dressed up as art, because most of us don't understand the physics of software, and it's rarely if ever taught. The physics of software is not algorithms, data structures, languages and abstractions. These are just tools we make, use, throw away. The real physics of software is the physics of people.

Specifically, our limitations when it comes to complexity, and our desire to work together to solve large problems in pieces. This is the science of programming: make building blocks that people can understand and use //easily//, and people will work together to solve the very largest problems.

We live in a connected world, and modern software has to navigate this world. So the building blocks for tomorrow's very largest solutions are connected and massively parallel. It's not enough for code to be "strong and silent" any more. Code has to talk to code. Code has to be chatty, sociable, well-connected. Code has to run like the human brain, trillions of individual neurons firing off messages to each other, a massively parallel network with no central control, no single point of failure, yet able to solve immensely difficult problems. And it's no accident that the future of code looks like the human brain, because the endpoints of every network are, at some level, human brains.

If you've done any work with threads, protocols, or networks, you'll realize this is pretty much impossible. It's a dream. Even connecting a few programs across a few sockets is plain nasty, when you start to handle real life situations. Trillions? The cost would be unimaginable. Connecting computers is so difficult that software and services to do this is a multi-billion dollar business.

So we live in a world where the wiring is years ahead of our ability to use it. We had a software crisis in the 1980s, when people like Fred Brooks believed [http://en.wikipedia.org/wiki/No_Silver_Bullet there was no "Silver Bullet"]. Free and open source software solved that crisis, enabling us to share knowledge efficiently. Today we face another software crisis, but it's one we don't talk about much. Only the largest, richest firms can afford to create connected applications. There is a cloud, but it's proprietary. Our data, our knowledge is disappearing from our personal computers into clouds that we cannot access, cannot compete with. Who owns our social networks? It is like the mainframe-PC revolution in reverse.

We can leave the political philosophy for another book. The point is that while the Internet offers the potential of massively connected code, the reality is that this is out of reach for most of us, and so, large interesting problems (in health, education, economics, transport, and so on) remain unsolved because there is no way to connect the code, and thus no way to connect the brains that could work together to solve these problems.

There have been many attempts to solve the challenge of connected software. There are thousands of IETF specifications, each solving part of the puzzle. For application developers, HTTP is perhaps the one solution to have have been simple enough to work, but it arguably makes the problem worse, by encouraging developers and architects to think in terms of big servers and thin, stupid clients.

So today people are still connecting applications using raw UDP and TCP, proprietary protocols, HTTP, WebSockets. It remains painful, slow, hard to scale, and essentially centralized. Distributed p2p architectures are mostly for play, not work. How many applications use Skype or Bittorrent to exchange data?

Which brings us back to the science of programming. To fix the world, we needed to do two things. One, to solve the general problem of "how to connect any code to any code, anywhere". Two, to wrap that up in the simplest possible building blocks that people could understand and use //easily//.

It sounds ridiculously simple. And maybe it is. That's kind of the whole point.

+++ 0MQ in a Hundred Words

0MQ (ZeroMQ, 0\MQ, zmq) looks like an embeddable networking library but acts like a concurrency framework. It gives you sockets that carry whole messages across various transports like in-process, inter-process, TCP, and multicast. You can connect sockets N-to-N with patterns like fanout, pub-sub, task distribution, and request-reply. It's fast enough to be the fabric for clustered products. Its asynchronous I/O model gives you scalable multicore applications, built as asynchronous message-processing tasks. It has a score of language APIs and runs on most operating systems. 0MQ is from [http://www.imatix.com iMatix] and is LGPL open source.

+++ Some Assumptions

We assume you are using the latest stable release of 0MQ. We assume you are using a Linux box or something similar. We assume you can read C code, more or less, that's the default language for the examples. We assume that when we write constants like PUSH or SUBSCRIBE you can imagine they are really called ZMQ_PUSH or ZMQ_SUBSCRIBE if the programming language needs it.

+++ Getting the Examples

The Guide examples live in the Guide's [https://github.com/imatix/zguide git repository]. The simplest way to get all the examples is to clone this repository:

[[code]]
git clone git://github.com/imatix/zguide.git
[[/code]]

And then browse the examples subdirectory. You'll find examples by language. If there are examples missing in a language you use, you're encouraged to [http://zguide.zeromq.org/main:translate submit a translation]. This is how the Guide became so useful, thanks to the work of many people.

All examples are licensed under MIT/X11, unless otherwise specified in the source code.

+++ Ask and Ye Shall Receive

So let's start with some code. We start of course with a Hello World example. We'll make a client and a server. The client sends "Hello" to the server, which replies with "World". Here's the server in C, which opens a 0MQ socket on port 5555, reads requests on it, and replies with "World" to each request:

[[code type="example" title="Hello World server" name="hwserver" language="C"]]
[[/code]]

[[code type="textdiagram"]]
          +------------+
          |            |
          |   Client   |
          |            |
          +------------+
          |    REQ     |
          \---+--------/
              |    ^
              |    |
         "Hello"  "World"
              |    |
              v    |
          /--------+---\
          |    REP     |
          +------------+
          |            |
          |   Server   |
          |            |
          +------------+


     Figure # - Request-Reply
[[/code]]

The REQ-REP socket pair is lockstep. The client does zmq_send[3] and then zmq_recv[3], in a loop (or once if that's all it needs). Doing any other sequence (e.g. sending two messages in a row) will cause an error. Similarly the service does zmq_recv[3] and then zmq_send[3] in that order, and as often as it needs to.

0MQ uses C as its reference language and this is the main language we'll use for examples. If you're reading this on-line, the link below the example takes you to translations into other programming languages. Let's compare the same server in C++:

[[code type="example" title="Hello World server" name="hwserver" language="C++"]]
[[/code]]

You can see that the 0MQ API is similar in C and C++. In a language like PHP, we can hide even more and the code becomes even easier to read:

[[code type="example" title="Hello World server" name="hwserver" language="PHP"]]
[[/code]]

Here's the client code (click the link below the source to look at, or contribute a translation in your favorite programming language):

[[code type="example" title="Hello World client" name="hwclient"]]
[[/code]]

Now this looks too simple to be realistic, but a 0MQ socket is what you get when you take a normal TCP socket, inject it with a mix of radioactive isotopes stolen from a secret Soviet atomic research project, bombard it with 1950-era cosmic rays, and put it into the hands of a drug-addled comic book author with a badly-disguised fetish for bulging muscles clad in spandex. Yes, 0MQ sockets are the world-saving superheros of the networking world.

[[code type="textdiagram"]]
 +------------+        +------------+
 |            |        |            | Zap!
 | TCP socket +------->| 0MQ socket |
 |            | BOOM!  |     cC00   |  POW!!
 +------------+        +------------+
   ^    ^    ^
   |    |    |
   |    |    +---------+
   |    |              |
   |    +----------+   |
  Illegal          |   |
  radioisotopes    |   |
  from secret      |   |
  Soviet atomic    | Spandex
  city             |
               Cosmic rays


    Figure # - A terrible accident...
[[/code]]

You could literally throw thousands of clients at this server, all at once, and it would continue to work happily and quickly. For fun, try starting the client and //then// starting the server, see how it all still works, then think for a second what this means.

Let me explain briefly what these two programs are actually doing. They create a 0MQ context to work with, and a socket. Don't worry what the words mean. You'll pick it up. The server binds its REP (reply) socket to port 5555. The server waits for a request, in a loop, and responds each time with a reply. The client sends a request and reads the reply back from the server.

There is a lot happening behind the scenes but what matters to us programmers is how short and sweet the code is, and how often it doesn't crash, even under heavy load. This is the request-reply pattern, probably the simplest way to use 0MQ. It maps to RPC and the classic client-server model.

+++ A Minor Note on Strings

0MQ doesn't know anything about the data you send except its size in bytes. That means you are responsible for formatting it safely so that applications can read it back. Doing this for objects and complex data types is a job for specialized libraries like Protocol Buffers. But even for strings you need to take care.

In C and some other languages, strings are terminated with a null byte. We could send a string like "HELLO" with that extra null byte:

[[code language="C"]]
zmq_msg_init_data (&request, "Hello", 6, NULL, NULL);
[[/code]]

However if you send a string from another language it probably will not include that null byte. For example, when we send that same string in Python, we do this:

[[code language="Python"]]
socket.send ("Hello")
[[/code]]

Then what goes onto the wire is:

[[code type="textdiagram"]]
+-----+    +-----+-----+-----+-----+-----+
|  5  |    |  H  |  e  |  l  |  l  |  o  |
+-----+    +-----+-----+-----+-----+-----+


          Figure # - A 0MQ string
[[/code]]

And if you read this from a C program, you will get something that looks like a string, and might by accident act like a string (if by luck the five bytes find themselves followed by an innocently lurking null), but isn't a proper string. Which means that your client and server don't agree on the string format, you will get weird results.

When you receive string data from 0MQ, in C, you simply cannot trust that it's safely terminated. Every single time you read a string you should allocate a new buffer with space for an extra byte, copy the string, and terminate it properly with a null.

So let's establish the rule that **0MQ strings are length-specified, and are sent on the wire //without// a trailing null**. In the simplest case (and we'll do this in our examples) a 0MQ string maps neatly to a 0MQ message frame, which looks like the above figure, a length and some bytes.

Here is what we need to do, in C, to receive a 0MQ string and deliver it to the application as a valid C string:

[[code language="C"]]
//  Receive 0MQ string from socket and convert into C string
static char *
s_recv (void *socket) {
    zmq_msg_t message;
    zmq_msg_init (&message);
    zmq_recv (socket, &message, 0);
    int size = zmq_msg_size (&message);
    char *string = malloc (size + 1);
    memcpy (string, zmq_msg_data (&message), size);
    zmq_msg_close (&message);
    string [size] = 0;
    return (string);
}
[[/code]]

This makes a very handy helper function and in the spirit of making things we can reuse profitably, let's write a similar 's_send' function that sends strings in the correct 0MQ format, and package this into a header file we can reuse.

The result is {{zhelpers.h}}, which lets us write sweeter and shorter 0MQ applications in C. It is a fairly long source, and only fun for C developers, so [https://github.com/imatix/zguide/blob/master/examples/C/zhelpers.h read it at leisure].

+++ Version Reporting

0MQ does come in several versions and quite often, if you hit a problem, it'll be something that's been fixed in a later version. So it's a useful trick to know //exactly// what version of 0MQ you're actually linking with. Here is a tiny program that does that:

[[code type="example" title="0MQ version reporting" name="version"]]
[[/code]]

+++ Getting the Message Out

The second classic pattern is one-way data distribution, in which a server pushes updates to a set of clients. Let's see an example that pushes out weather updates consisting of a zipcode, temperature, and relative humidity. We'll generate random values, just like the real weather stations do.

Here's the server. We'll use port 5556 for this application:

[[code type="example" title="Weather update server" name="wuserver"]]
[[/code]]

There's no start, and no end to this stream of updates, it's like a never ending broadcast.

[[code type="textdiagram"]]
                 +-------------+
                 |             |
                 |  Publisher  |
                 |             |
                 +-------------+
                 |     PUB     |
                 \-------------/
                      bind
                        |
                        |
                     updates
                        |
        +---------------+---------------+
        |               |               |
     updates         updates         updates
        |               |               |
        |               |               |
        v               v               v
     connect         connect         connect
  /------------\  /------------\  /------------\
  |    SUB     |  |    SUB     |  |    SUB     |
  +------------+  +------------+  +------------+
  |            |  |            |  |            |
  | Subscriber |  | Subscriber |  | Subscriber |
  |            |  |            |  |            |
  +------------+  +------------+  +------------+


           Figure # - Publish-Subscribe
[[/code]]

Here is client application, which listens to the stream of updates and grabs anything to do with a specified zipcode, by default New York City because that's a great place to start any adventure:

[[code type="example" title="Weather update client" name="wuclient"]]
[[/code]]

Note that when you use a SUB socket you **must** set a subscription using zmq_setsockopt[3] and SUBSCRIBE, as in this code. If you don't set any subscription, you won't get any messages. It's a common mistake for beginners. The subscriber can set many subscriptions, which are added together. That is, if a update matches ANY subscription, the subscriber receives it. The subscriber can also unsubscribe specific subscriptions. Subscriptions are length-specified blobs. See zmq_setsockopt[3] for how this works.

The PUB-SUB socket pair is asynchronous. The client does zmq_recv[3], in a loop (or once if that's all it needs). Trying to send a message to a SUB socket will cause an error. Similarly the service does zmq_send[3] as often as it needs to, but must not do zmq_recv[3] on a PUB socket.

There is one important thing to know about PUB-SUB sockets: you do not know precisely when a subscriber starts to get messages. Even if you start a subscriber, wait a while, and then start the publisher, **the subscriber will always miss the first messages that the publisher sends**. This is because as the subscriber connects to the publisher (something that takes a small but non-zero time), the publisher may already be sending messages out.

This "slow joiner" symptom hits enough people, often enough, that I'm going to explain it in detail. Remember that 0MQ does asynchronous I/O, i.e. in the background. Say you have two nodes doing this, in this order:

* Subscriber connects to an endpoint and receives and counts messages.
* Publisher binds to an endpoint and immediately sends 1,000 messages.

Then the subscriber will most likely not receive anything. You'll blink, check that you set a correct filter, and try again, and the subscriber will still not receive anything.

Making a TCP connection involves to and fro handshaking that takes several milliseconds depending on your network and the number of hops between peers. In that time, 0MQ can send very many messages. For sake of argument assume it takes 5 msecs to establish a connection, and that same link can handle 1M messages per second. During the 5 msecs that the subscriber is connecting to the publisher, it takes the publisher only 1 msec to send out that 1K messages.

In Chapter Two I'll explain how to synchronize a publisher and subscribers so that you don't start to publish data until the subscriber(s) really are connected and ready. There is a simple and stupid way to delay the publisher, which is to sleep. I'd never do this in a real application though, it is extremely fragile as well as inelegant and slow. Use sleeps to prove to yourself what's happening, and then wait for Chapter 2 to see how to do this right.

The alternative to synchronization is to simply assume that the published data stream is infinite and has no start, and no end. This is how we built our weather client example.

So the client subscribes to its chosen zip-code and collects a thousand updates for that zip-code. That means about ten million updates from the server, if zip-codes are randomly distributed. You can start the client, and then the server, and the client will keep working. You can stop and restart the server as often as you like, and the client will keep working. When the client has collected its thousand updates, it calculates the average, prints it, and exits.

Some points about the publish-subscribe pattern:

* A subscriber can in fact connect to more than one publisher, using one 'connect' call each time. Data will then arrive and be interleaved so that no single publisher drowns out the others.

* If a publisher has no connected subscribers, then it will simply drop all messages.

* If you're using TCP, and a subscriber is slow, messages will queue up on the publisher. We'll look at how to protect publishers against this, using the "high-water mark" later.

* In the current versions of 0MQ, filtering happens at the subscriber side, not the publisher side. This means, over TCP, that a publisher will send all messages to all subscribers, which will then drop messages they don't want.

This is how long it takes to receive and filter 10M messages on my box, which is an Intel 4 core Q8300, fast but nothing special:

[[code]]
ph@ws200901:~/work/git/0MQGuide/examples/c$ time wuclient
Collecting updates from weather server...
Average temperature for zipcode '10001 ' was 18F

real    0m5.939s
user    0m1.590s
sys     0m2.290s
[[/code]]

+++ Divide and Conquer

As a final example (you are surely getting tired of juicy code and want to delve back into philological discussions about comparative abstractive norms), let's do a little supercomputing. Then coffee. Our supercomputing application is a fairly typical parallel processing model:

* We have a ventilator that produces tasks that can be done in parallel.
* We have a set of workers that process tasks.
* We have a sink that collects results back from the worker processes.

In reality, workers run on superfast boxes, perhaps using GPUs (graphic processing units) to do the hard maths. Here is the ventilator. It generates 100 tasks, each is a message telling the worker to sleep for some number of milliseconds:

[[code type="example" title="Parallel task ventilator" name="taskvent"]]
[[/code]]

[[code type="textdiagram"]]
                  +-------------+
                  |             |
                  |  Ventilator |
                  |             |
                  +-------------+
                  |    PUSH     |
                  \------+------/
                         |
                       tasks
                         |
         +---------------+---------------+
         |               |               |
       task            task             task
         |               |               |
         v               v               v
   /------------\  /------------\  /------------\
   |    PULL    |  |    PULL    |  |    PULL    |
   +------------+  +------------+  +------------+
   |            |  |            |  |            |
   |   Worker   |  |   Worker   |  |   Worker   |
   |            |  |            |  |            |
   +------------+  +------------+  +------------+
   |    PUSH    |  |    PUSH    |  |    PUSH    |
   \-----+------/  \-----+------/  \-----+------/
         |               |               |
       result          result          result
         |               |               |
         +---------------+---------------+
                         |
                      results
                         |
                         v
                  /-------------\
                  |    PULL     |
                  +-------------+
                  |             |
                  |    Sink     |
                  |             |
                  +-------------+


           Figure # - Parallel Pipeline
[[/code]]

Here is the worker application. It receives a message, sleeps for that number of seconds, then signals that it's finished:

[[code type="example" title="Parallel task worker" name="taskwork"]]
[[/code]]

Here is the sink application. It collects the 100 tasks, then calculates how long the overall processing took, so we can confirm that the workers really were running in parallel, if there are more than one of them:

[[code type="example" title="Parallel task sink" name="tasksink"]]
[[/code]]

The average cost of a batch is 5 seconds. When we start 1, 2, 4 workers we get results like this from the sink:

[[code]]
#   1 worker
Total elapsed time: 5034 msec
#   2 workers
Total elapsed time: 2421 msec
#   4 workers
Total elapsed time: 1018 msec
[[/code]]

Let's look at some aspects of this code in more detail:

* The workers connect upstream to the ventilator, and downstream to the sink. This means you can add workers arbitrarily. If the workers bound to their endpoints, you would need (a) more endpoints and (b) to modify the ventilator and/or the sink each time you added a worker. We say that the ventilator and sink are 'stable' parts of our architecture and the workers are 'dynamic' parts of it.

* We have to synchronize the start of the batch with all workers being up and running. This is a fairly common gotcha in 0MQ and there is no easy solution. The 'connect' method takes a certain time. So when a set of workers connect to the ventilator, the first one to successfully connect will get a whole load of messages in that short time while the others are also connecting. If you don't synchronize the start of the batch somehow, the system won't run in parallel at all. Try removing the wait, and see.

* The ventilator's PUSH socket distributes tasks to workers (assuming they are all connected //before// the batch starts going out) evenly. This is called //load-balancing// and it's something we'll look at again in more detail.

* The sink's PULL socket collects results from workers evenly. This is called //fair-queuing//:

[[code type="textdiagram"]]
  +---------+   +---------+   +---------+
  |  PUSH   |   |  PUSH   |   |  PUSH   |
  \----+----/   \----+----/   \----+----/
       |             |             |
   R1, R2, R3       R4           R5, R6
       |             |             |
       +-------------+-------------+
                     |
               fair-queuing
           R1, R4, R5, R2, R6, R3
                     |
                     v
              /-------------\
              |     PULL    |
              +-------------+


          Figure # - Fair queuing
[[/code]]

The pipeline pattern also exhibits the "slow joiner" syndrome, leading to accusations that PUSH sockets don't load balance properly. If you are using PUSH and PULL, and one of your workers gets way more messages than the others, it's because that PULL socket has joined faster than the others, and grabs a lot of messages before the others manage to connect.

+++ Programming with 0MQ

Having seen some examples, you're eager to start using 0MQ in some apps. Before you start that, take a deep breath, chillax, and reflect on some basic advice that will save you stress and confusion.

* Learn 0MQ step by step. It's just one simple API but it hides a world of possibilities. Take the possibilities slowly, master each one.

* Write nice code. Ugly code hides problems and makes it hard for others to help you. You might get used to meaningless variable names, but people reading your code won't. Use names that are real words, that say something other than "I'm too careless to tell you what this variable is really for". Use consistent indentation, clean layout. Write nice code and your world will be more comfortable.

* Test what you make as you make it. When your program doesn't work, you should know what five lines are to blame. This is especially true when you do 0MQ magic, which just //won't// work the first times you try it.

* When you find that things don't work as expected, break your code into pieces, test each one, see which one is not working. 0MQ lets you make essentially modular code, use that to your advantage.

* Make abstractions (classes, methods, whatever) as you need them. If you copy/paste a lot of code you're going to copy/paste errors too.

To illustrate, here is a fragment of code someone asked me to help fix:

[[code]]
//  NOTE: do NOT reuse this example code!
static char *topic_str = "msg.x|";

void* pub_worker(void* arg){
    void *ctx = arg;
    assert(ctx);

    void *qskt = zmq_socket(ctx, ZMQ_REP);
    assert(qskt);

    int rc = zmq_connect(qskt, "inproc://querys");
    assert(rc == 0);

    void *pubskt = zmq_socket(ctx, ZMQ_PUB);
    assert(pubskt);

    rc = zmq_bind(pubskt, "inproc://publish");
    assert(rc == 0);

    uint8_t cmd;
    uint32_t nb;
    zmq_msg_t topic_msg, cmd_msg, nb_msg, resp_msg;

    zmq_msg_init_data(&topic_msg, topic_str, strlen(topic_str) , NULL, NULL);

    fprintf(stdout,"WORKER: ready to recieve messages\n");
    //  NOTE: do NOT reuse this example code, It's broken.
    //  e.g. topic_msg will be invalid the second time through
    while (1){
    zmq_send(pubskt, &topic_msg, ZMQ_SNDMORE);

    zmq_msg_init(&cmd_msg);
    zmq_recv(qskt, &cmd_msg, 0);
    memcpy(&cmd, zmq_msg_data(&cmd_msg), sizeof(uint8_t));
    zmq_send(pubskt, &cmd_msg, ZMQ_SNDMORE);
    zmq_msg_close(&cmd_msg);

    fprintf(stdout, "recieved cmd %u\n", cmd);

    zmq_msg_init(&nb_msg);
    zmq_recv(qskt, &nb_msg, 0);
    memcpy(&nb, zmq_msg_data(&nb_msg), sizeof(uint32_t));
    zmq_send(pubskt, &nb_msg, 0);
    zmq_msg_close(&nb_msg);

    fprintf(stdout, "recieved nb %u\n", nb);

    zmq_msg_init_size(&resp_msg, sizeof(uint8_t));
    memset(zmq_msg_data(&resp_msg), 0, sizeof(uint8_t));
    zmq_send(qskt, &resp_msg, 0);
    zmq_msg_close(&resp_msg);

    }
    return NULL;
}
[[/code]]

This is what I rewrote it to, as part of finding the bug:

[[code language="C"]]
static void *
worker_thread (void *arg) {
    void *context = arg;
    void *worker = zmq_socket (context, ZMQ_REP);
    assert (worker);
    int rc;
    rc = zmq_connect (worker, "ipc://worker");
    assert (rc == 0);

    void *broadcast = zmq_socket (context, ZMQ_PUB);
    assert (broadcast);
    rc = zmq_bind (broadcast, "ipc://publish");
    assert (rc == 0);

    while (1) {
        char *part1 = s_recv (worker);
        char *part2 = s_recv (worker);
        printf ("Worker got [%s][%s]\n", part1, part2);
        s_sendmore (broadcast, "msg");
        s_sendmore (broadcast, part1);
        s_send     (broadcast, part2);
        free (part1);
        free (part2);

        s_send (worker, "OK");
    }
    return NULL;
}
[[/code]]

In the end, the problem was that the application was passing sockets between threads, which crashed weirdly. It became legal behavior in 0MQ/2.1, but remains dangerous and something we advise against doing.

+++ 0MQ/2.1

History tells us that 0MQ/2.0 is when low-latency distributed messaging crawled out of the primeval mud, shook off a heavy coat of buzzwords and enterprise jargon, and reached its branches up to the sky, as if to cry, "no limits!". We've been using this stable branch since it spawned 0MQ/2.0.8 during the hot days of August, 2010.

But times change, and what was cool in 2010 is no longer //a la mode// in 2011. The 0MQ developers and community have been frantically busy redefining messaging chic, and anyone who's anyone knows that 2.1 is the new stable.

The Guide therefore assumes you're running 2.1.x. Let's look at the differences, as they affect your applications coming from the old 2.0:

* In 2.0, zmq_close[3] and zmq_term[3] discarded any in-flight messages, so it was unsafe to close a socket and terminate right after sending messages. In 2.1, these calls are safe: zmq_term will flush anything that's waiting to be sent. In 2.0 examples we often added a sleep(1) to get around the problem. In 2.1, this isn't needed.

* By contrast, in 2.0, it was safe to call zmq_term[3] even if there were open sockets. In 2.1, this is not safe, and it can cause zmq_term to block. So in 2.1 we //always close every socket//, before exiting. Furthermore, if you have any outgoing messages or connects waiting on a socket, 2.1 will by default wait forever trying to deliver these. You must //set the LINGER socket option// (e.g. to zero), on every socket which may still be busy, before calling zmq_term:

[[code language="C"]]
int zero = 0;
zmq_setsockopt (mysocket, ZMQ_LINGER, &zero, sizeof (zero));
[[/code]]

* In 2.0, zmq_poll[3] would return arbitrarily early, so you could not use it as a timer. We would work around this with a loop checked how much time was left, and called zmq_poll again as needed. In 2.1, zmq_poll properly waits for the full timeout if there are no events.

* In 2.0, 0MQ would ignore interrupted system calls, which meant that no libzmq call would ever return EINTR if a signal was received during its operation. This caused problems with loss of signals such as SIGINT (Ctrl-C handling), especially for language runtimes. In 2.1, any blocking call such as zmq_recv[3] will return EINTR if it is interrupted by a signal.

+++ Getting the Context Right

0MQ applications always start by creating a //context//, and then using that for creating sockets. In C, it's the zmq_init[3] call. You should create and use exactly one context in your process. Technically, the context is the container for all sockets in a single process, and acts as the transport for {{inproc}} sockets, which are the fastest way to connect threads in one process. If at runtime a process has two contexts, these are like separate 0MQ instances. If that's explicitly what you want, OK, but otherwise remember:

**Do one zmq_init[3] at the start of your main line code, and one zmq_term[3] at the end.**

If you're using the fork() system call, each process needs its own context. If you do zmq_init[3] in the main process before calling fork(), the child processes get their own contexts. In general you want to do the interesting stuff in the child processes, and just manage these from the parent process.

+++ Making a Clean Exit

Classy programmers share the same motto as classy hit men: always clean-up when you finish the job. When you use 0MQ in a language like Python, stuff gets automatically freed for you. But when using C you have to carefully free objects when you're finished with them, or you get memory leaks, unstable applications, and generally bad karma.

Memory leaks is one thing, but 0MQ is quite finicky about how you exit an application. The reasons are technical and painful but the upshot is that if you leave any sockets open, the zmq_term[3] function will hang forever. And even if you close all sockets, zmq_term[3] will by default wait forever if there are pending connects or sends. Unless you set the LINGER to zero on those sockets before closing them.

The 0MQ objects we need to worry about are messages, sockets, and contexts. Luckily it's quite simple, at least in simple programs:

* Always close a message the moment you are done with it, using zmq_msg_close[3].

* If you are opening and closing a lot of sockets, that's probably a sign you need to redesign your application.

* When you exit the program, close your sockets and then call zmq_term[3]. This destroys the context.

If you're doing multithreaded work, it gets rather more complex than this. We'll get to multithreading in the next chapter, but because some of you will, despite warnings, will try to run before you can safely walk, below is the quick and dirty guide to making a clean exit in a //multithreaded// 0MQ application.

First, do not try to use the same socket from multiple threads. No, don't explain why you think this would be excellent fun, just please don't do it. Next, relingerfy and close all sockets, and terminate the context in the main thread. Lastly, this'll cause any blocking receives or polls or sends in attached threads (i.e. which share the same context) to return with an error. Catch that, and then relingerize and close sockets in //that// thread, and exit. Do not terminate the same context twice. The zmq_term in the main thread will block until all sockets it knows about are safely closed.

Voila! It's complex and painful enough that any language binding author worth his or her salt will do this automatically and make the socket closing dance unnecessary.

+++ Why We Needed 0MQ

Now that you've seen 0MQ in action, let's go back to the "why".

Many applications these days consist of components that stretch across some kind of network, either a LAN or the Internet. So many application developers end up doing some kind of messaging. Some developers use message queuing products, but most of the time they do it themselves, using TCP or UDP. These protocols are not hard to use, but there is a great difference between sending a few bytes from A to B, and doing messaging in any kind of reliable way.

Let's look at the typical problems we face when we start to connect pieces using raw TCP. Any reusable messaging layer would need to solve all or most these:

* How do we handle I/O? Does our application block, or do we handle I/O in the background? This is a key design decision. Blocking I/O creates architectures that do not scale well. But background I/O can be very hard to do right.

* How do we handle dynamic components, i.e. pieces that go away temporarily? Do we formally split components into "clients" and "servers" and mandate that servers cannot disappear? What then if we want to connect servers to servers? Do we try to reconnect every few seconds?

* How do we represent a message on the wire? How do we frame data so it's easy to write and read, safe from buffer overflows, efficient for small messages, yet adequate for the very largest videos of dancing cats wearing party hats?

* How do we handle messages that we can't deliver immediately? Particularly, if we're waiting for a component to come back on-line? Do we discard messages, put them into a database, or into a memory queue?

* Where do we store message queues? What happens if the component reading from a queue is very slow, and causes our queues to build up? What's our strategy then?

* How do we handle lost messages? Do we wait for fresh data, request a resend, or do we build some kind of reliability layer that ensures messages cannot be lost? What if that layer itself crashes?

* What if we need to use a different network transport. Say, multicast instead of TCP unicast? Or IPv6? Do we need to rewrite the applications, or is the transport abstracted in some layer?

* How do we route messages? Can we send the same message to multiple peers? Can we send replies back to an original requester?

* How do we write an API for another language? Do we re-implement a wire-level protocol or do we repackage a library? If the former, how can we guarantee efficient and stable stacks? If the latter, how can we guarantee interoperability?

* How do we represent data so that it can be read between different architectures? Do we enforce a particular encoding for data types? How far is this the job of the messaging system rather than a higher layer?

* How do we handle network errors? Do we wait and retry, ignore them silently, or abort?

Take a typical open source project like [http://hadoop.apache.org/zookeeper/ Hadoop Zookeeper] and read the C API code in [http://github.com/apache/zookeeper/blob/trunk/src/c/src/zookeeper.c src/c/src/zookeeper.c]. It's 3,200 lines of mystery and in there is an undocumented, client-server network communication protocol. I see it's efficient because it uses poll() instead of select(). But really, Zookeeper should be using a generic messaging layer and an explicitly documented wire level protocol. It is incredibly wasteful for teams to be building this particular wheel over and over.

[[code type="textdiagram"]]
             +------------+
             |            |
             |  Piece A   |
             |            |
             +------------+
                   ^
                   |
                  TCP
                   |
                   v
             +------------+
             |            |
             |  Piece B   |
             |            |
             +------------+


  Figure # - Messaging as it starts
[[/code]]

But how to make a reusable messaging layer? Why, when so many projects need this technology, are people still doing it the hard way, by driving TCP sockets in their code, and solving the problems in that long list, over and over?

It turns out that building reusable messaging systems is really difficult, which is why few FOSS projects ever tried, and why commercial messaging products are complex, expensive, inflexible, and brittle. In 2006 iMatix designed [http://www.amqp.org AMQP] which started to give FOSS developers perhaps the first reusable recipe for a messaging system. AMQP works better than many other designs [http://www.imatix.com/articles:whats-wrong-with-amqp but remains relatively complex, expensive, and brittle]. It takes weeks to learn to use, and months to create stable architectures that don't crash when things get hairy.

Most messaging projects, like AMQP, that try to solve this long list of problems in a reusable way do so by inventing a new concept, the "broker", that does addressing, routing, and queuing. This results in a client-server protocol or a set of APIs on top of some undocumented protocol, that let applications speak to this broker. Brokers are an excellent thing in reducing the complexity of large networks. But adding broker-based messaging to a product like Zookeeper would make it worse, not better. It would mean adding an additional big box, and a new single point of failure. A broker rapidly becomes a bottleneck and a new risk to manage. If the software supports it, we can add a second, third, fourth broker and make some fail-over scheme. People do this. It creates more moving pieces, more complexity, more things to break.

And a broker-centric set-up needs its own operations team. You literally need to watch the brokers day and night, and beat them with a stick when they start misbehaving. You need boxes, and you need backup boxes, and you need people to manage those boxes. It is only worth doing for large applications with many moving pieces, built by several teams of people, over several years.

So small to medium application developers are trapped. Either they avoid network programming, and make monolithic applications that do not scale. Or they jump into network programming and make brittle, complex applications that are hard to maintain. Or they bet on a messaging product, and end up with scalable applications that depend on expensive, easily broken technology. There has been no really good choice, which is maybe why messaging is largely stuck in the last century and stirs strong emotions. Negative ones for users, gleeful joy for those selling support and licenses.

[[code type="textdiagram"]]
            +---+          |  +---+
    +---+   |   |   +---+  |  |   |
    |   +-->|   |   |   |  |  |   |
    |   |   +---+   |   |  |  +-+-+
    +-+-+           +-+-+  |    |
      |               |    |    |
      |       +-----------------+
      |       |       |    |
      +-----------------------+
              |       |    |  |
      +-------|-------|----+--|------+
      |       v       |       v      |
    +-+-+   +---+     |     +---+    |
    |   |   |   |   +-+-+   |   |    |
    |   |   |   |   |   |   |   |    |
    +---+   +---+   |   |   +---+    |
              ^     +---+     ^      |
              |       ^       |    +-+
      +-------+-------+-------+    |
      |       |       |            |
      v     +-+-+     v     +---+  |
    +---+   |   |   +---+   |   |  |
    |   |   |   |<--+   |   |   |<-+
    |   |   +---+   |   |   +-+-+
    +---+           +---+


   Figure # - Messaging as it becomes
[[/code]]

What we need is something that does the job of messaging but does it in such a simple and cheap way that it can work in any application, with close to zero cost. It should be a library that you just link with, without any other dependencies. No additional moving pieces, so no additional risk. It should run on any OS and work with any programming language.

And this is 0MQ: an efficient, embeddable library that solves most of the problems an application needs to become nicely elastic across a network, without much cost.

Specifically:

* It handles I/O asynchronously, in background threads. These communicate with application threads using lock-free data structures, so 0MQ applications need no locks, semaphores, or other wait states.

* Components can come and go dynamically and 0MQ will automatically reconnect. This means you can start components in any order. You can create "service-oriented architectures" (SOAs) where services can join and leave the network at any time.

* It queues messages automatically when needed. It does this intelligently, pushing messages to as close as possible to the receiver before queuing them.

* It has ways of dealing with over-full queues (called "high water mark"). When a queue is full, 0MQ automatically blocks senders, or throws away messages, depending on the kind of messaging you are doing (the so-called "pattern").

* It lets your applications talk to each other over arbitrary transports: TCP, multicast, in-process, inter-process. You don't need to change your code to use a different transport.

* It handles slow/blocked readers safely, using different strategies that depend on the messaging pattern.

* It lets you route messages using a variety of patterns such as request-reply and publish-subscribe. These patterns are how you create the topology, the structure of your network.

* It lets you place pattern-extending "devices" (small brokers) in the network when you need to reduce the complexity of interconnecting many pieces.

* It delivers whole messages exactly as they were sent, using a simple framing on the wire. If you write a 10k message, you will receive a 10k message.

* It does not impose any format on messages. They are blobs of zero to gigabytes large. When you want to represent data you choose some other product on top, such as Google's protocol buffers, XDR, and others.

* It handles network errors intelligently. Sometimes it retries, sometimes it tells you an operation failed.

* It reduces your carbon footprint. Doing more with less CPU means your boxes use less power, and you can keep your old boxes in use for longer. Al Gore would love 0MQ.

Actually 0MQ does rather more than this. It has a subversive effect on how you develop network-capable applications. Superficially it's just a socket API on which you do zmq_recv[3] and zmq_send[3]. But message processing rapidly becomes the central loop, and your application soon breaks down into a set of message processing tasks. It is elegant and natural. And it scales: each of these tasks maps to a node, and the nodes talk to each other across arbitrary transports. Two nodes in one process (node is a thread), two nodes on one box (node is a process), two boxes on one network (node is a box). With no application code changes.

+++ Socket Scalability

Let's see 0MQ's scalability in action. Here is a shell script that starts the weather server and then a bunch of clients in parallel:

[[code]]
wuserver &
wuclient 12345 &
wuclient 23456 &
wuclient 34567 &
wuclient 45678 &
wuclient 56789 &
[[/code]]

As the clients run, we take a look at the active processes using 'top', and we see something like (on a 4-core box):

[[code]]
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 7136 ph        20   0 1040m 959m 1156 R  157 12.0  16:25.47 wuserver
 7966 ph        20   0 98608 1804 1372 S   33  0.0   0:03.94 wuclient
 7963 ph        20   0 33116 1748 1372 S   14  0.0   0:00.76 wuclient
 7965 ph        20   0 33116 1784 1372 S    6  0.0   0:00.47 wuclient
 7964 ph        20   0 33116 1788 1372 S    5  0.0   0:00.25 wuclient
 7967 ph        20   0 33072 1740 1372 S    5  0.0   0:00.35 wuclient
[[/code]]

Let's think for a second about what is happening here. The weather server has a single socket, and yet here we have it sending data to five clients in parallel. We could have thousands of concurrent clients. The server application doesn't see them, doesn't talk to them directly.

+++ Missing Message Problem Solver

As you start to program with 0MQ you will come across one problem more than once: you lose messages that you expect to receive. Here is a basic problem solver that walks through the most common causes for this. Don't worry if some of the terminology is unfamiliar still, it'll become clearer in the next chapters.

[[code type="textdiagram"]]
        +-----------------+
        |                 |
        | I'm not getting |
        |     my data!    |
        |             {o} |
        +--------+--------+
                 |
                 |
                 v
        +-----------------+        +-----------------+        +------------------+
        |                 |        |                 |        | Use the          |
        | Are you losing  |        |  Do you set a   |        | zmq_setsockopt   |
        |  messages in a  +------->|  subscription   +------->| ZMQ_SUBSCRIBE    |
        |   SUB socket?   | Yes    |  for messages?  | No     | ("") option      |
        |             {o} |        |             {o} |        |                  |
        +--------+--------+        +--------+--------+        +------------------+
                 | No                       | Yes
                 |                          |
                 |                          v
                 |                 +-----------------+        +------------------+
                 |                 |                 |        | Start all SUB    |
                 |                 |  Do you start   |        | sockets first,   |
                 |                 |  the SUB socket +------->| then the PUB     |
                 |                 |  after the PUB? | No     | sockets to avoid |
                 |                 |             {o} |        | loss             |
                 |                 +--------+--------+        +------------------+
                 |                          | Yes
                 |                          |
                 |                          v
                 |              +-------------------------+
                 |              |  See explanation of the |
                 |              | "slow joiner" syndrome  |
                 |              |  syndrome in this text. |
                 |              +-------------------------+
                 |
                 |
                 v
        +-----------------+        +--------------------+
        |                 |        | With REQ, send and |
        |  Are you using  |        | recv in a loop and |
        |   REQ and REP   +------->| check the return   |
        |     sockets?    | Yes    | codes. With REP    |
        |             {o} |        | it's recv + send.  |
        +--------+--------+        +--------------------+
                 | No
                 |
                 v
        +-----------------+        +---------------------+        +-----------------+
        |                 |        | The 1st PULL socket |        | You may need to |
        |  Are you using  |        | to connect can grab |        | do extra work to|
        |  PUSH sockets?  +------->| 1000's of messages  +------->| synchronize your|
        |                 | Yes    | before the others   |        | sockets before  |
        |             {o} |        | get there.          |        | sending tasks.  |
        +--------+--------+        +---------------------+        +-----------------+
                 | No
                 |
                 v
        +-----------------+        +-----------------+
        |                 |        |                 |
        |  Do you check   |        | Check every 0MQ |
        | return codes on +------->| method call. In |
        |  all methods?   | No     | C, use asserts. |
        |             {o} |        |                 |
        +--------+--------+        +-----------------+
                 | Yes
                 |
                 v
        +-----------------+        +-----------------+        +------------------+
        |                 |        |                 |        |                  |
        | Are you using   |        |   Do you pass   |        | Create a socket  |
        | threads in your +------->| sockets between +------->| in the thread    |
        |  app already?   | Yes    |    threads?     | Yes    | where you use it |
        |             {o} |        |             {o} |        |                  |
        +--------+--------+        +--------+--------+        +------------------+
                 | No                       | No
                 +--------------------------+
                 |
                 v
        +-----------------+        +-----------------+        +------------------+
        |                 |        |                 |        |                  |
        |  Are you using  |        | Are you calling |        | Call zmq_init    |
        |   the inproc    +------->|  zmq_init more  +------->| exactly once in  |
        |   transport?    | Yes    |    than once?   | Yes    | every process.   |
        |             {o} |        |             {o} |        |                  |
        +--------+--------+        +--------+--------+        +------------------+
                 | No                       | No
                 |                          |
                 |                          v
                 |                 +-----------------+
                 |                 |                 |
                 |                 | Check that you  |
                 |                 | bind before you |
                 |                 | connect.        |
                 |                 |                 |
                 |                 +-----------------+
                 |
                 v
        +-----------------+        +-----------------+        +-----------------+
        |                 |        | Check that the  |        | If you're using |
        |  Are you using  |        | reply address   |        | identities make |
        | ROUTER sockets? +------->| is valid. 0MQ   +------->| sure to set them|
        |                 | Yes    | drops messages  |        | before not after|
        |             {o} |        | it can't route. |        | you connect.    |
        +--------+--------+        +-----------------+        +--------+--------+
                 | No
                 |
                 v
        +-----------------+        +--------------------+
        |                 |        | You probably have  |
        | Are you losing  |        | a client running   |
        |   one message   +------->| in the background. |
        |    in two?      | Yes    | Kill it and start  |
        |             {o} |        | again.             |
        +--------+--------+        +--------------------+
                 | No
                 |
                 v
        +-----------------+
        |                 |
        | Make a minimal  |
        | test case, ask  |
        | on zeromq IRC.  |
        |                 |
        +-----------------+


                 Figure # - Missing Message Problem Solver
[[/code]]

If you're using 0MQ in a context where failures are expensive, then you want plan properly. First, build prototypes that let you learn and test the different aspects of your design. Stress them until they break, so that you know exactly how strong your designs are. Second, invest in testing. This means building test frameworks, ensuring you have access to realistic setups with sufficient computer power, and getting time or help to actually test seriously. Ideally, one team writes the code, a second team tries to break it. Lastly, do get your organization to [http://www.imatix.com/contact contact iMatix] to discuss how we can help to make sure things work properly, and can be fixed rapidly if they break.

In short: if you have not proven an architecture works in realistic conditions, it will most likely break at the worst possible moment.

+++ Warning - Unstable Paradigms!

Traditional network programming is built on the general assumption that one socket talks to one connection, one peer. There are multicast protocols but they are exotic. When we assume "one socket = one connection", we scale our architectures in certain ways. We create threads of logic where each thread work with one socket, one peer. We place intelligence and state in these threads.

In the 0MQ universe, sockets are clever multithreaded applications that manage a whole set of connections automagically for you. You can't see, work with, open, close, or attach state to these connections. Whether you use blocking send or receive, or poll, all you can talk to is the socket, not the connections it manages for you. The connections are private and invisible, and this is the key to 0MQ's scalability.

Because your code, talking to a socket, can then handle any number of connections across whatever network protocols are around, without change. A messaging pattern sitting in 0MQ can scale more cheaply than a messaging pattern sitting in your application code.

So the general assumption no longer applies. As you read the code examples, your brain will try to map them to what you know. You will read "socket" and think "ah, that represents a connection to another node". That is wrong. You will read "thread" and your brain will again think, "ah, a thread represents a connection to another node", and again your brain will be wrong.

If you're reading this Guide for the first time, realize that until you actually write 0MQ code for a day or two (and maybe three or four days), you may feel confused, especially by how simple 0MQ makes things for you, and you may try to impose that general assumption on 0MQ, and it won't work. And then you will experience your moment of enlightenment and trust, that //zap-pow-kaboom// satori paradigm-shift moment when it all becomes clear.