Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Oracle-ES consistency #240

Merged
merged 28 commits into from
May 28, 2019
Merged
Changes from 1 commit
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
9903938
Add consistency query to stage 009.
Evildoor Mar 27, 2019
42d55b3
Add script for consistency check to stage 069.
Evildoor Mar 29, 2019
f27f278
Get index name from config rather than code.
Evildoor Apr 2, 2019
af9f212
Check for index' existence before working.
Evildoor Apr 2, 2019
4c560a6
Update documentation.
Evildoor Apr 2, 2019
2296f6d
Add a very basic consistency check script.
Evildoor Apr 2, 2019
b8a2ab1
Generalize 069-consistency.
Evildoor Apr 4, 2019
acfe45a
Save and display the info about different tasks.
Evildoor Apr 5, 2019
e39efe2
Merge remote-tracking branch 'origin/master' into oracle-es-consistency
Evildoor Apr 5, 2019
8a71791
Move certain shell functions to library.
Evildoor Apr 5, 2019
90380a9
Remove DEBUG mode.
Evildoor Apr 17, 2019
7bac202
Move ES consistency script into a separate stage.
Evildoor Apr 17, 2019
12dd86e
Update a query description.
Evildoor Apr 18, 2019
944b5a2
Update and explain a magic number.
Evildoor Apr 18, 2019
26a1dfe
Reword es_connect() description.
Evildoor Apr 18, 2019
20875b1
Change log prefixes to standard ones.
Evildoor Apr 18, 2019
46cf0af
Fix pop() results handling.
Evildoor Apr 18, 2019
72d85a9
Update ES parameters handling.
Evildoor Apr 18, 2019
cacba11
Remove batching of inconsistent records.
Evildoor Apr 18, 2019
4d8eb83
Merge remote-tracking branch 'origin/master' into oracle-es-consistency
Evildoor Apr 19, 2019
181fb14
Add consistency data samples.
Evildoor Apr 19, 2019
7117242
Update the dataflow README.
Evildoor Apr 19, 2019
165c5d2
Ignore two additional fields.
Evildoor May 21, 2019
8f84d22
Change messages formatting.
Evildoor May 22, 2019
d195650
Add _parent field handling.
Evildoor May 22, 2019
612bf52
Remove service fields before checking.
Evildoor May 22, 2019
01ae258
Remove interpreter directives from lib files.
Evildoor May 22, 2019
8f86ddd
Simplify a field retrieval.
Evildoor May 28, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion Utils/Dataflow/069_upload2es/consistency.py
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,11 @@ def main(args):
cfg = load_config(stage.ARGS.conf)
stage.process = process
es_connect(cfg)
stage.run()
if not es.indices.exists(INDEX):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for information: this check would cause AuthorizationException in case of remote connection via Nginx proxy:

>>> es = elasticsearch.Elasticsearch('http://login:[email protected]:9200') 
>>> es.indices.exists('test_prodsys_rucio_ami')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 76, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/usr/lib/python2.7/site-packages/elasticsearch/client/indices.py", line 213, in exists
    params=params)
  File "/usr/lib/python2.7/site-packages/elasticsearch/transport.py", line 318, in perform_request
    status, headers_response, data = connection.perform_request(method, url, params, body, headers=headers, ignore=ignore, timeout=timeout)
  File "/usr/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 185, in perform_request
    self._raise_error(response.status, raw_data)
  File "/usr/lib/python2.7/site-packages/elasticsearch/connection/base.py", line 125, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.AuthorizationException: TransportError(403, u'')

I updated proxy configuration to allow HEAD requests (to readable locations), as there`s no actual need to block them; so now there should be no problem with it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And (again, just for information), there was another way to hit the goal:

Check for index' existence before working.
es.get() raises NotFoundError in both cases - when index does not exist and
when document does not exist. Also, it's more reasonable to check index once
since it's the same for all messages.

Even with NotFoundError it is possible to say one situation from another:

>>> try: es.get(index='test_prodsys_rucio_ami', doc_type='task', id=1468216)
... except Exception, err: pass
... 
>>> err.info
{u'found': False, u'_type': u'task', u'_id': u'1468216', u'_index': u'test_prodsys_rucio_ami'}
>>>
>>> try: es.get(index='_no_such_index_', doc_type='task', id=14682166)                                                                                     
... except Exception, err: pass
>>> err.info
{u'status': 404, u'error': {u'index_uuid': u'_na_', u'index': u'tprodsys_rucio_ami', u'resource.type': u'index_expression', u'root_cause': [{u'index_uuid': u'_na_', u'index': u'tprodsys_rucio_ami', u'resource.type': u'index_expression', u'resource.id': u'tprodsys_rucio_ami', u'reason': u'no such index', u'type': u'index_not_found_exception'}], u'reason': u'no such index', u'type': u'index_not_found_exception', u'resource.id': u'tprodsys_rucio_ami'}}
>>> err.info['error']['reason']
u'no such index'

And in case of the error -- or, maybe, even in case of any error, when info['error'] is defined (but I am not sure if there can possibly be any other error) -- re-raise the exception to indicate that the process can not be continued. Maybe wrapping the exception into DataflowException.

The only situation when it makes any difference is when during the process execution the index was removed or access policy changed: the check was successfully passed on the start, so any NotFoundError after this will be taken as "record missed", no matter what. But I don`t think it is likely to happen, so there`s nothing wrong in the one-time check.

log('No such index: %s' % INDEX, 'ERROR')
exit_code = 4
else:
stage.run()
except (DataflowException, RuntimeError), err:
if str(err):
log(err, 'ERROR')
Expand Down