-
Notifications
You must be signed in to change notification settings - Fork 2
Internal communication protocol
This page provides specification of the internal communication protocol, developed for data and flow control commands transfer between supervisor and worker of the ETL process' stages.
- The protocol is used for inter-process communication.
- The protocol purpose is data transfer and data flow control between two processes: stage supervisor (
S
) and worker (W
). - Connection between
S
andW
is established byS
executingW
's run instructions (through theW
's process standard input (W-STDIN
) and output streams (W-STDOUT
)). - Protocol elements are:
-
<marker>
-- ASCII symbol, never encountered in the transferred data; -
<message>
-- raw data (message content), ending withEOP
(end-of-process) marker; -
<batch>
-- group of messages ending withEOB
(end-of-batch) marker.
-
- The
<message>
content format and encode/decode rules are not defined by the protocol and to be coordinated at the application level. - All the
<marker>
s have default values, which can be altered byS
and passed to theW
as a part of its run instruction. - List of
<marker>
s:-
EOM
(end-of-message):- can be sent by:
S
,W
; - usage: placed to the stream after the raw data (message content) to indicate the end of the message content;
- can be sent by:
-
EOP
(end-of-process):- can be sent by:
W
; - usage: placed to
W-STDOUT
after the last message of theW
's operation execution result (or by its own , if no messages produced) to indicate that requested operation on data is finished andW
is ready for the next command;
- can be sent by:
-
EOB
(end-of-batch):- can be sent by:
S
; - usage: placed to
W-STDIN
after last message in a group to indicate end the group, passed toW
for batch processing;
- can be sent by:
-
BNC
(batch-not-complete):- can be sent by:
W
; - usage: placed to
W-STDOUT
afterEOP
or previousBNC
to request one more message for batch processing fromS
.
- can be sent by:
-
-
S
andW
processes communication scenario depends on the stage's type (E-, T- or L-).
-
S
:- execute
W
's run instruction; - until
W-STDOUT
is open:- read
<message>
fromW-STDOUT
; - pass
<message>
downstream (to the next stage); - repeat;
- read
- repeat.
- execute
-
W
:- execute "extraction" operation;
- generate
<messages>
for data flow; - write
<messages>
toW-STDOUT
(one by one); - close
W-STDOUT
(exit).
-
S
:-
send empty
<message>
to (already running)W
(write toW-STDIN
) (*)OR
send
<message>
with last and current (start and end) offset values to (already running)W
(write toW-STDIN
); (**) -
until
EOP
is atW-STDOUT
:- read
<message>
fromW-STDOUT
; - pass
<message>
downstream (to the next stage); - repeat;
- read
-
repeat.
-
-
W
:-
read empty
<message>
fromW-STDIN
(*)OR
read
<message>
with start and end offset values; (**) -
execute "extraction" operation;
-
generate
<messages>
for data flow; -
write
<messages>
toW-STDOUT
(one by one); -
write
EOP
toW-STDOUT
; -
repeat.
-
(*) Offset managed at W
's side (e.g. local storage in standalone version).
(**) Offset managed at S
'es side.
NOTE: L-stage produces empty transformation result.
-
S
:- send
<message>
to (already running)W
(write toW-STDIN
); - until
EOP
is atW-STDOUT
:- read
<message>
fromW-STDOUT
; - pass
<message>
downstream (to the next stage); - repeat;
- read
- repeat.
- send
-
W
:- read
<message>
fromW-STDIN
; - process
<message>
(transform it to one, multiple or zero messages / load to final storage); - write transformation result message(s) to
W-STDOUT
, if any; - write
EOP
toW-STDOUT
; - repeat.
- read
NOTE: L-stage produces empty transformation result.
-
S
:-
send
<message>
to (already running)W
(write toW-STDIN
); -
while
BNC
is atW-STDOUT
(*)OR
while number of sent messages is less than
BATCH_SIZE
: (**):- IF there are
<messages>
to be processed:- write
<message>
toW-STDIN
;
- write
- ELSE:
- write
EOB
;
- write
- repeat;
- IF there are
-
IF there are
<messages>
to be processed:- write
EOB
toW-STDIN
; (**)
- write
-
until
EOP
is atW-STDOUT
:- read
<message>
fromW-STDOUT
; - pass
<message>
downstream (to the next stage); - repeat;
- read
-
repeat.
-
-
W
:-
until number of read messages is greater then (or equal to)
BATCH_SIZE
OREOB
is atW-STDIN
(*)OR
until
EOB
is atW-STDIN
: (**)- read
<message>
fromW-STDIN
; - write
BNC
toW-STDOUT
; (*) - repeat;
- read
-
process all
<messages>
(transform each to one, multiple or zero messages / load to final storage); -
write transformation result message(s) to
W-STDOUT
, if any; -
write
EOP
toW-STDOUT
; -
repeat.
-
(*) Worker-driven batch processing.
(**) Supervisor-driven batch processing.