From 4cbaaed21e970913ae753338858cd59f76062e34 Mon Sep 17 00:00:00 2001
From: GitHub Action This is a short but rather dense briefing to explain
the motivation for the use of AMQP by the MetPX-Sarracenia
data pump. Sarracenia is essentially an AMQP application,
@@ -178,14 +178,14 @@ AMQP is a universal message passing protocol with many different
options to support many different messaging patterns. MetPX-sarracenia specifies and uses a
small subset of AMQP patterns. An important element of Sarracenia development was to
select from the many possibilities a small subset of methods are general and
easily understood, in order to maximize potential for interoperability. Specifying the use of a protocol alone may be insufficient to provide enough information for
data exchange and interoperability. For example when exchanging data via FTP, a number of choices
need to be made above and beyond the protocol. AMQP 1.0 standardizes the on-the-wire protocol, but removed all broker standardization.
As the use of brokers is key to Sarracenia´s use of, was a fundamental element of earlier standards,
and as the 1.0 standard is relatively controversial, this protocol assumes a pre 1.0 standard broker,
@@ -211,7 +211,7 @@ AMQP - Primer for Sarracenia
+AMQP - Primer for Sarracenia
AMQP - Primer for Sar
AMQP Feature Selection
+AMQP Feature Selection
Analogy FTP
+Analogy FTP
Analogy FTP
AMQP: not 1.0, but 0.8 or 0.9
+AMQP: not 1.0, but 0.8 or 0.9
AMQP: not 1.0, but 0.
but 0.9 and post 0.9 brokers could inter-operate well.
In AMQP prior to 1.0, many different actors can define communication parameters, such as exchanges to publish to, queues where notification messages accumulate, and bindings between the two. Applications and users declare and user their exchanges, queues, and bindings. All of this was dropped @@ -234,7 +234,7 @@
Topic-based exchanges are used exclusively. AMQP supports many other types of exchanges, but sr_post have the topic sent in order to support server side filtering by using topic based filtering. At AMQP 1.0, topic-based exchanges (indeed all exchanges, are no @@ -250,7 +250,7 @@
The AMQP messages contain notification messages, no actual file data. AMQP is optimized for and assumes small messages. Keeping the messages small allows for maximum message throughtput and permits clients to use priority mechanisms based on transfer of data, rather than the notification messages. @@ -284,7 +284,7 @@
AMQP has many other settings, and reliability for a particular use case is assured by making the right choices.
An AMQP Server is called a Broker. Broker is sometimes used to refer to the software, other times server running the broker software (same confusion as web server.) In the above diagram, AMQP vocabulary is in Orange, and Sarracenia terms are in blue.
@@ -358,7 +358,7 @@in Version 2, MetPX-Sarracenia is only a light wrapper/coating around AMQP. in Version 3, this was reworked and an MQTT driver was added to make it less AMQP specific.
@@ -402,7 +402,7 @@If you understood the rest of the document, this should make sense to you:
An AMQP broker is a server process that houses exchanges and queues used to route notification messages with very low latency. A publisher sends notification messages to an exchange, while a consumer reads @@ -412,9 +412,9 @@
open standard, multiple free implementations.
low latency message passing.
Open International standard from financial world.
Many proprietary similar systems exist, AMQP built to get away from lock-in. Standard is built with long experience of vendor messaging systems, and so quite mature.
AMQP is the messaging technology chosen by the OpenStack cloud.
Adopting AMQP is more like adopting XML than it is like adopting FTP. FTP interoperability is easy as choices are limited. With XML, however you get more palette than painting. Many different dialects, schema methods, etc… XML will be valid and parse, but without diff --git a/Contribution/BasicIdea.html b/Contribution/BasicIdea.html index 481ee5531..e10bd447d 100644 --- a/Contribution/BasicIdea.html +++ b/Contribution/BasicIdea.html @@ -8,9 +8,9 @@ - + - + @@ -114,7 +114,7 @@
Status: Approved-Draft1-20150608
MetPX-Sarracenia is a data duplication or distribution engine that leverages existing standard technologies (sftp and web servers and AMQP brokers) to achieve real-time message delivery and end to end transparency in file transfers. Whereas in Sundew, each diff --git a/Contribution/Design.html b/Contribution/Design.html index 581c69c45..23286b390 100644 --- a/Contribution/Design.html +++ b/Contribution/Design.html @@ -8,9 +8,9 @@ - + - + @@ -146,7 +146,7 @@
Status: Draft
This document reflects the current design resulting from discussions and thinking at a more detailed level that the outline document. See Outline for an overview of the design requirements. See use-cases for @@ -195,7 +195,7 @@
- @@ -230,7 +230,7 @@
Are there cluster file systems available everywhere? No.
- 1.2 Number of Switches
+1.2 Number of Switches
The application is supposed to support any number of topologies, that is any number of pumps S=0,1,2,3 may exist between origin and final delivery, and do the right thing.
Why isn´t everything point to point, or when do you insert a pumps?
@@ -260,7 +260,7 @@
- 1.3 AMQP Feature Selection
+1.3 AMQP Feature Selection
AMQP is a universal message passing protocol with many different options to support many different messaging patterns. MetPX-sarracenia specifies and uses a small subset of AMQP patterns. Indeed an important element of sarracenia development was to select from the @@ -326,7 +326,7 @@
- 1.4 Application
+1.4 Application
Description of application logic relevant to discussion. There is a ´control plane´ where notification messages about new data available are made, and log messages reporting status of transfers of the same data are routed among control plane users and pumps. A pump is an AMQP broker, and users authenticate to the broker. Data @@ -383,7 +383,7 @@
- 1.5 Routing
+1.5 Routing
There are two distinct flows to route: notification messages, and logs. The following header in messages relate to routing, which are set in all messages.
@@ -402,7 +402,7 @@
- 1.5.1 Routing Posts
+1.5.1 Routing Posts
Post routing is the routing of the notification messages announced by data sources. The data corresponding to the source follows the same sequence of pumps as the notification messages themselves. When a notification message is processed on a pump, it is downloaded, and then the @@ -437,7 +437,7 @@
- 1.5.2 Routing Logs
+1.5.2 Routing Logs
Log messages are defined in the sr_log(7) man page. They are emitted by consumers at the end, as well as feeders as the messages traverse pumps. log messages are posted to the xl_<user> exchange, and after log validation queued for the xlog exchange.
@@ -456,9 +456,9 @@
- 1.6 Security Model
+1.6 Security Model
- 1.6.1 Users, Queues & Exchanges
+1.6.1 Users, Queues & Exchanges
- Each user Alice, on a broker to which she has access:
- @@ -481,7 +481,7 @@
has an exchange xs_Alice, where she writes her notification messages, and reads her logs from.
-
1.6.2 Pre-Validation
+1.6.2 Pre-Validation
Pre-Validation refers to security and correctness checks performed on the information provided by the notification message before the data itself is downloaded. Some tools may refer to this as message validation
@@ -529,7 +529,7 @@-
1.6.3 Post-Validation
+1.6.3 Post-Validation
When a file is downloaded, before re-announcing it for later hops it goes through some analysis. The tools may call this file validation:
@@ -544,7 +544,7 @@-
1.6.4 Log Validation
+1.6.4 Log Validation
When a client like sarra or subscribe completes an operation, it creates a log message corresponding to the result of the operation. (This is much lower granularity than a local log files.) It is important for one client not to be able to impersonate another @@ -575,7 +575,7 @@
-
1.6.5 Private vs. Public Data Transfer
+1.6.5 Private vs. Public Data Transfer
Transfers in the past have been public, just a matter of sharing public information. A crucial requirement of the package is to support private data copies, where the ends of the transfer are not sharing with arbitrary others.
@@ -611,7 +611,7 @@-
1.6.6 HTTPS Private Access
+1.6.6 HTTPS Private Access
Note
FIXME: Not designed yet. @@ -625,7 +625,7 @@
-
1.7 Topologies
+1.7 Topologies
Questions… There are many choices for cluster layout. One can do simple H/A on a pair of nodes, simple active/passive? One can go to scalable designs on an array of nodes, which requires a load balancer ahead of the processing nodes. The disks of a cluster can be shared or individual to @@ -665,7 +665,7 @@
-
1.7.1 Standalone
+1.7.1 Standalone
In a standalone configuration, there is only one node in the configuration. I runs all components and shares none with any other nodes. That means the Broker and data services such as sftp and apache are on the one node.
@@ -673,7 +673,7 @@-
1.7.2 DDSR: Switching/Routing Configuration
+1.7.2 DDSR: Switching/Routing Configuration
This is a more scalable configuration involving several data mover nodes, and potentially several brokers. These clusters are not destinations of data transfers, but intermediaries. Data flows through them, but querying them is more complicated because no one node has all data available. The downstream clients @@ -687,7 +687,7 @@