Skip to content

Reliability, Message Ordering and Duplicate Messages

coderplay edited this page May 4, 2012 · 18 revisions

Reliability, Message Ordering and Duplicate Messages

Reliability

Reliability is guaranteed by Metamorphosis through all the process between client side and server side.

Reliability Assurance of Producers

As long as one piece of message is sent by the producer, a SendResult will return. If the value of isSuccess is true, that means the message is confirmed to been received and persisted by the server. The whole process is synchronous. Metamorphosis makes sure that message be delivered to the server side and returns a result.

Reliability Assurance of Servers

Once the server receives a message sent by a producer, 校验(checksum??). Therefore, it is guaranteed that the message successfully sent to the server side be flushed into a persistent storage.

Since the operation system, linux for example, has a layer called page cache, a successful writing doesn't equal the data reaches the hard disk. Metamorphosis provides two configuration options to let the message data be flushed into persistent devices.

  • Once a specific number (1000 by default) of messages be received, make a force。
  • Once a specific number (10 by default) of secs passed, make a force。

Thus, even emergency unfortunately happens (e.g. power cut off), Metamorphosis is able to guaranteed at most 1000 pieces of messages loss in 10 secs. However, you can even get zero message lost whereby configuration adjustment.

服务器通常组织为一个集群,一条从生产者过来的消息可能按照路由规则存储到集群中的某台机器。Meta已经实现高可用的HA方案,类似mysql的同步和异步复制,将一台meta服务器的数据完整复制到另一台slave服务器,并且slave服务器还提供消费功能(同步复制不提供消费)。

Reliability Assurance of Consumers

消息的消费者是一条接着一条地消费消息,只有在成功消费一条消息后才会接着消费下一条。如果在消费某条消息失败(如异常),则会尝试重试消费这条消息(默认最大5次),超过最大次数后仍然无法消费,则将消息存储在消费者的本地磁盘,由后台线程继续做重试。而主线程继续往后走,消费后续的消息。因此,只有在MessageListener确认成功消费一条消息后,meta的消费者才会继续消费另一条消息。由此来保证消息的可靠消费。 Consumer consumes messages one by one, it go .

消费者的另一个可靠性的关键点是offset的存储,也就是拉取数据的偏移量。我们目前提供了以下几种存储方案

  • zookeeper,默认存储在zoopkeeper上,zookeeper通过集群来保证数据的安全性。
  • mysql,可以连接到您使用的mysql数据库,只要建立一张特定的表来存储。完全由数据库来保证数据的可靠性。
  • file,文件存储,将offset信息存储在消费者的本地文件中。

Offset会定期保存,并且在每次重新负载均衡前都会强制保存一次。

Message Ordering

Some people are concerned about message ordering, they hope consumer consumes messages would in the order of they were sent. For example, consider a message is sent in an order of A, B, C, then Message A will arrives before B, which arrives before C. Out-of-order processing is unacceptable for some applications.

Metamorphosis对消息顺序性的保证是有限制的,默认情况下,消息的顺序以谁先达到服务器并写入磁盘,则谁就在先的原则处理。并且,发往同一个分区的消息保证按照写入磁盘的顺序让消费者消费,这是因为消费者针对每个分区都是按照从前到后递增offset的顺序拉取消息。

Meta可以保证,在单线程内使用该producer发送的消息按照发送的顺序达到服务器并存储,并按照相同顺序被消费者消费,前提是这些消息发往同一台服务器的同一个分区。为了实现这一点,你还需要实现自己的PartitionSelector用于固定选择分区

public interface PartitionSelector {
    public Partition getPartition(String topic, List<Partition> partitions, Message message) throws MetaClientException;
}

选择分区可以按照一定的业务逻辑来选择,如根据业务id来取模。或者如果是传输文件,可以固定选择第n个分区使用。当然,如果传输文件,通常我们会建议你只配置一个分区,那也就无需选择了。

消息的顺序发送我们在1.2这个版本提供了OrderedMessageProducer,自定义管理分区信息,并提供故障情况下的本地存储功能。

##Duplicate Messages Message duplicating includes two aspects, where one is sending duplicate messages from producers, the other is of ending duplicate messages from consumers.

It likely to producer that when they have sent a message and being waiting for a response from the server side 针对生产者来说,有可能发生这种情况,生产者发送消息,等待服务器应答,这个时候发生网络故障,服务器实际已经将消息写入成功,但是由于网络故障没有返回应答。那么生产者会认为发送失败,则再次发送同一条消息,如果发送成功,则服务器实际存储两条相同的消息。这种由故障引起的重复,meta是无法避免的,因为meta不判断消息的data是否一致,因为它并不理解data的语义,而仅仅是作为载荷来传输。

针对消费者来说也有这个问题,消费者成功消费一条消息,但是此时断电,没有及时将前进后的offset存储起来,则下次启动的时候或者其他同个分组的消费者owner到这个分区的时候,会重复消费该条消息。这种情况meta也无法完全避免。

Meta对消息重复的保证只能说在正常情况下保证不重复,异常情况无法保证,这些限制是由远程调用的语义引起的,要做到完全不重复的代价很高,meta暂时不会考虑。