Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[obsolete] DF/data4es: 'update' mode for safe dataset metadata update. #247

Closed
wants to merge 2 commits into from

Conversation

mgolosova
Copy link
Collaborator

@mgolosova mgolosova commented Apr 28, 2019

Closed for #253 + #262 + #263 in sum do the trick: we can run data4es with -i 91_in,91_out,95 and be sure that data that already present in ES won't be spoiled.

#320 is the next step in this direction that will allow getting data from Rucio but in case of error fall back to the "update" scenario.

However it does not provide functionality like "query Rucio only if we don't have these data in ES". It'd be nice, but in fact it should be done by introducing another stage that would extract all available information from ES in the beginning of the data4es process, instead of asking every stage to query ES for its own data.

Original description

Applies functionality added in #245 and #246 to the data4es process, allowing to start this process in a normal (basic integration) and 'safe' (archive update) mode, which can be turned on with --update option.

[WIP] is due to the pyDKB-related changes: they clearly do not belong here. By the way, even after this change I have seen that ConnectionTimeout exception; but as we are talking about 'archived metadata update', it seems to me quite OK to be interrupted in case of overloaded ES and restart after an hour or so.


Waits for #245, #246.

Sometimes we see ConnectionTimeout even with simple `get` request; it
only means that for some reason ES just can't do anything about the
request, not that the request is too heavy.

Now there is a possibility to set number of timeout retries when the
client is created; by default the number is 3.

The ES client itself (`elasticsearch.Elasticsearch()`) by default turned
off the 'retry on timeout' possibilityr, so we have to turn it on 'by
hand'; while the retry number 3 is just the same as default.
In this mode all the stages that can use ES as a "backup" storage are
configured to sdo so. It takes more time than a direct integration, yet
allows to run it for arcived data and not to worry that some information
will be missed.
@mgolosova mgolosova self-assigned this Apr 28, 2019
@mgolosova mgolosova changed the title [WIP] DF/data4es: 'update' mode for safe dataset metadata update. [obsolete] DF/data4es: 'update' mode for safe dataset metadata update. Aug 9, 2019
@mgolosova
Copy link
Collaborator Author

Closed for #253 + #262 + #263 in sum do the trick: we can run data4es with -i 91_in,91_out,95 and be sure that data that already present in ES won't be spoiled.

#320 is the next step in this direction that will allow getting data from Rucio but in case of error fall back to the "update" scenario.

However it does not provide functionality like "query Rucio only if we don't have these data in ES". It'd be nice, but in fact it should be done by introducing another stage that would extract all available information from ES in the beginning of the data4es process, instead of asking every stage to query ES for its own data.

@mgolosova mgolosova closed this Feb 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant