Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[obsolete] DF/095: safe dataset metadata update #246

Open
wants to merge 4 commits into
base: 91-ds-safe
Choose a base branch
from

Conversation

mgolosova
Copy link
Collaborator

@mgolosova mgolosova commented Apr 28, 2019

The PR is obsolete but not closed yet: need to check if #253 does what this PR tried to achieve.

Update Stage 95 to allow safe metadata update (with ES@DKB as a 'backup' storage, used when some metadata are missed in AMI).

[WIP] is because of pyDKB changes: not sure if they belong this PR.

And about a9d10ac commit log: testing shows that metadata in AMI do change from time to time, and sometimes they are removed.

ToDo:

  • move pyDKB-related changes to another branch;
    • to think about: maybe read_es_config() should be turned into read_sh_config() or something?
  • remove 091-related changes from the history (rebase on [obsolete] pyDKB storages #244 or something?);
  • update a9d10ac log message;

Waits for #244

There can possibly be different functions, useful for some specific
operations. It is possible to keep them all in the individual files, but
it is not very comfortable to use (one have to type something like
`from pyDKB.common.my_function_name import my_function_name`, or in
`__init__` file have a list of imports like this:
`from my_function_name import my_function_name`).

So I think it will be fine to put them all to a common file ('utils' or
'misc' or whatever) and use like `from pyDKB.common.utils import
my_function_name`). Looks a little better (for me, at least).
Function like this seem to be needed in multiple places, so why not to
have it in the common library.
If `--es-config FILE` parameter is specified, use DKB ES storage
as a backup metadata source in case that in primary source (AMI)
information was removed.

I am not sure under what curcumstances information can be removed from
AMI, so for now we just check if there are empty/missed fields in the
data taken from AMI and then, if such firlds found, check ES for
(possibly) already known values.

The problem is that there are quite a lot of "missed" values, so almost
for every record we have to check both AMI and ES. Maybe there is more
delicate trigger, like "No data at all", or "if dataset property
'deleted' is set to True", or...

Or maybe AMI just doesn't remove data at all?..
In theory it should make the request slightly faster.
@mgolosova mgolosova self-assigned this Apr 28, 2019
@mgolosova mgolosova changed the title [WIP] Stage 95: safe dataset update [WIP] DF/095: safe dataset metadata update Apr 28, 2019
@mgolosova mgolosova changed the title [WIP] DF/095: safe dataset metadata update [obsolete] DF/095: safe dataset metadata update Jun 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant