Skip to content

Internal communication protocol

Marina Golosova edited this page Oct 21, 2020 · 15 revisions

This page provides specification of the internal communication protocol, developed for data and flow control commands transfer between supervisor and worker of the ETL process' stages.

General terms.

  1. The protocol is used for inter-process communication.
  2. The protocol purpose is data transfer and data flow control between two processes: stage supervisor (S) and worker (W).
  3. Connection between S and W is established by S executing W's run instructions thruough the W's process standard input (W-STDIN) and output streams (W-STDOUT).
  4. Protocol elements are:
  • <marker> -- ASCII symbol, never encountered in the transferred data;
  • <message> -- raw data (message content), ending with EOP (end-of-process) marker;
  • <batch> -- group of messages ending with EOB (end-of-batch) marker.
  1. The <message> content format and encode/decode rules are not defined by the protocol and to be coordinated at the application level.
  2. All the <marker>s have default values, which can be altered by S and passed to the W's as a part of its run instruction.
  3. List of <marker>s:
  • EOM (end-of-message):
    • can be sent by: S, W;
    • usage: placed to the stream after the raw data (message content) to indicate the end of the message content;
  • EOP (end-of-process):
    • can be sent by: W;
    • usage: placed to W-STDOUT after the last message of the W's operation execution result (or by its own , if no messages produced) to indicate that requested operation on data is finished and W is ready for the next command;
  • EOB (end-of-batch):
    • can be sent by: S;
    • usage: placed to W-STDIN after last message in a group to indicate end the group, passed to W for batch processing;
  • BNC (batch-not-complete):
    • can be sent by: W;
    • usage: placed to W-STDOUT after EOP or previous BNC to request one more message for batch processing from S;
  • GET (get-new-data):
    • can be sent by: S;
    • usage:
  1. S and W processes communication scenario depends on the stage's type (E-, T- or L-).

E-stage scenario (plain).

  1. S executes W's run instruction and starts waiting for messages at W's STDOUT.
  2. W executes "extraction" operation and generates messages for data flow.
  3. W sends messages to STDOUT one by one.
  4. W stops operation (closes STDOUT).
  5. S reads all messages from the W's STDOUT.
  6. S passes read messages to the next stage.

E-stage scenario (cyclic).

  1. S sends GET marker to (already running) W's STDIN and starts waiting for messages at W's STDOUT.

To Be Continued...