Skip to content

Commit

Permalink
Move ES consistency script into a separate stage.
Browse files Browse the repository at this point in the history
While the script is the stage 069's counterpart in data4es-consistency-check,
they share no functionality.
  • Loading branch information
Evildoor committed Apr 17, 2019
1 parent 90380a9 commit 7bac202
Show file tree
Hide file tree
Showing 4 changed files with 31 additions and 21 deletions.
20 changes: 1 addition & 19 deletions Utils/Dataflow/069_upload2es/README
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

Description
-----------
load_data.sh uploads prepared data to ElasticSearch.
Uploads prepared data to ElasticSearch.

Input
-----
Expand All @@ -18,24 +18,6 @@ JSON documents, one per line:
...
}}}

Consistency
-----------
consistency.py checks that the data is present in ElasticSearch instead of
uploading it. Input comes from Stage 009(in consistency mode) and only needs 2
fields for now:
{{{
{taskid, task_timestamp}
...
}}}

Consistency check can be run as following:

./consistency.py --conf elasticsearch_config

For more information about running the check and its arguments, use:

./consistency.py -h

TODO
----
Make the stage aware of EOProcess/EOMessage markers
28 changes: 28 additions & 0 deletions Utils/Dataflow/071_esConsistency/README
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
=============
* Stage 071 *
=============

1. Description
--------------
Checks that the given data is present in ElasticSearch.

Input must contain at least 2 fields:
{{{
{"_type": ..., "_id": ..., ...}
...
}}}

_type and _id are required to retrieve the document from ES. All the other
fields are compared with the document's corresponding ones. Results of the
comparison are written to stderr.

2. Running the stage
--------------------
The stage can be run as following:

./consistency.py --conf elasticsearch_config

For more information about running the stage and its arguments, use:

./consistency.py -h

File renamed without changes.
4 changes: 2 additions & 2 deletions Utils/Dataflow/run/data4es-consistency-check
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,6 @@ cmd_016="$base_dir/../016_task2es/task2es.py -m s"

# ES
cfg_es=`get_config "es"`
cmd_069="$base_dir/../069_upload2es/consistency.py -m s --conf $cfg_es"
cmd_071="$base_dir/../071_esConsistency/consistency.py -m s --conf $cfg_es"

$cmd_009 | $cmd_016 | eop_filter | $cmd_069 >/dev/null
$cmd_009 | $cmd_016 | eop_filter | $cmd_071 >/dev/null

0 comments on commit 7bac202

Please sign in to comment.