Support data object is member of a collection #394

perolavsvendsen · 2023-11-15T14:17:30Z

Discussions 13. november 2023
@jcrivenaes @daniel-sol @HansKallekleiv

In some cases, exported data objects from FMU are related to each other. The specific example discussed revolve around volume calculations, where the same data are represented as a table, a 3D grid parameter and a surface.

A need from Webviz is to programmatically link these together. Conceptually, they all "siblings". This has some parallels to the existing aggregation_id used for aggregated data, where multiple results from the same operation is assigned a common aggregation_id so that they can be linked together.

Elements discussed:

Should we simply add collection_id as an argument to ExportData.__init__(), and then forward this to the outgoing metadata? That means each object from this operation is tagged with the same ID. So, when a client is using one of the siblings, it will know that 1) there are other siblings and 2) be able to identify them from the collection_id
The name collection_id is perhaps not great.
From the data perspective, this may be more like member_of.
A data object can be member of more than one collection/family.
We want it to be unique per case, but we want elements exported from multiple realizations to belong to the same collection. Proposed solution is to give e.g. collection_tagname (placeholder name) as argument to ExportData(). Generate a uuid4 as a hash of current case.uuid and collection_tag. This ensures that the collection_id (placeholder name) is identical across all realizations, but unique per case.

(Also discussions on gathering all Volume-related exports into one single class, to make essentially a "one-liner". This proposal would be a first step towards this, by doing a very small iteration in the right direction - which includes some tag in the outgoing metadata that can be used. @daniel-sol can you describe in separate issue?)

Proposal

Add input argument collection_tagname to ExportData()
Create uuid4 as a hash of current case.uuid and collection_tagname
Append to data.member_of (placeholder name) in outgoing metadata.

In metadata definitions, add the necessary tag. Placeholder name: member_of.

Example:

data:
  name: MyName
  member_of:
    - b77e5c35-524b-43d7-9356-aa2ef2e382c3
    - d390eb27-f07f-4f9b-bf97-2b61a773753c

Example script:

from fmu.dataio import ExportData

exp = ExportData(collection_tagname="inplace_volumes")

for elem in AllTheElements:
    exp.export(elem)

The collection tagname is in this proposal not directly represented in outgoing metadata. It's only purpose is to be basis for the hash. We could include it, but risk is that it will be directly used by clients.

Please add discussion details I have forgotten.

The text was updated successfully, but these errors were encountered:

perolavsvendsen · 2023-11-18T14:48:22Z

Suggest adding a relations block in the metadata, next to data - not adding relational information inside the data block.

Example:

$schema: <schema_url>
data:
  x: y
  z: 1
relations:
  member_of:
    - <uuid4>
    - <uuid4>

Upside with using list is to capture when object is member of more than one collection. Downside is searchability. But probably possible to work around.

perolavsvendsen · 2023-11-18T15:17:01Z

Since data object can be member of > 1 collection, input argument must support more than one collection_name.

Will use collection_name, not collection_tagname, to avoid confusion with the other "tagname".

perolavsvendsen · 2023-11-18T19:24:48Z

Draft PR for discussions: #396

There are some unanswered questions still.

perolavsvendsen · 2023-11-21T09:28:53Z

Possibly confusing that the collection_name disappears. Should keep this (in addition to uuid).

perolavsvendsen · 2023-11-21T09:47:53Z

Instead of specifying collection_name we could consider simply using the instance of ExportData() as the identifier of the collection. Upside is invisibility, no API change. Downside is invisibility (user is not able to find "his" collection easily. No name.

Also, ExportData instances have no name/identifier.

jcrivenaes · 2023-11-21T10:42:40Z

Instead of specifying collection_name we could consider simply using the instance of ExportData() as the identifier of the collection. Upside is invisibility, no API change. Downside is invisibility (user is not able to find "his" collection easily. No name.

That is an interesting idea, but requires some user awareness: (1) You need to export all needed for you "collection" in the same script using the same instance, and (2) there is a risk of the opposite; people exports stuff not being indented as a collection by using the same instance. So they get a number of "unaware" collections...

For (1), the instance settings (properties) needs also to be updated in the export() jobs, since they in this case will a mix of e.g. surfaces and tables.

perolavsvendsen · 2023-11-21T13:14:03Z

Yes, there are some snags. If combined with a "best practice" description of how we intent fmu-dataio to be used (#395) it could be part of the contract that everything exported within the same instance of ExportData implicitly is a collection. But I also see that this will quickly break down, and probably some will want to export parts of a "collection" from different scripts (hence different instances).

So I don't think we should pursuit this....

perolavsvendsen · 2023-11-21T14:02:47Z

Possibly confusing that the collection_name disappears. Should keep this (in addition to uuid).

Updated draft PR #396 to produce relations.collections as a list of objects, where each "collection" is a dict containing uuid and name:

relations:
  collections:
    - name: "mycollection" # as written by user in input arguments
      uuid: <uuid4> # hash of name + case.uuid, so identical only within context of case

jcrivenaes · 2023-11-23T10:48:58Z

I wonder if the wanted collection names shall be defined at the "case" level (i.e. they must exists in global config or similar): Reasons:

The client (e.g. webviz) can then easily parse all possible collections early and add logics to further work
Avoid that typos in collection names in "various scripts around" are root to confusion (e.g. one script usename "volumetrics" while the other uses "Volumentrics"... the user is unaware
Avoid too many collections introduced as "nice to have"

perolavsvendsen · 2023-11-23T11:09:49Z

I wonder if the wanted collection names shall be defined at the "case" level (i.e. they must exists in global config or similar)...

Technically on model level, which makes sense. It has been a stated requirement that a "collection" must be unique to the case (hence it is hashed together with the case.uuid). But it could be that it sometimes must be unique also on iteration (ensemble) level, and possibly on realization level.

So, given these "levels" in the "hierarchy":
fmu (all FMU models)
model (this FMU model) <-- Should be defined here?
case (this specific case of this model)
ensemble (this specific ensemble, in this case)
realization (this specific realization of this iteration)

...we have assumed that it must be unique on case level, but that would still require clients to specify which iteration. So perhaps this should rather be unique on model level. Clients would then have to specify case but that is pretty common situation to be in?

Key question here is perhaps "what is the level of uniqueness needed for collections"? Do we need the unique id's at all, or is it enough with a string identifier/name?

Ref defining globally: So when exporting a data object tied to a collection, it must be verified that this collection is defined (in global config)? Not 100 % sure about the user experience, but that can be worked on.

perolavsvendsen · 2023-11-23T13:02:56Z

Discussion 23.11.2023

Current user story is related to inplace volumes - what other collections are there?
- HC thickness maps from simulator

Important that tagged collections can be grouped through Elasticsearch
Is tagname essentially a "collection"?

perolavsvendsen · 2023-11-29T09:10:17Z

From: #396 (comment) @daniel-sol

I have problems understanding how we are actually going to solve the multicollection part. In theory I think the idea is good, but in practice I cannot really see how this will be done. Imagine that you have a specific grid property. This can be part of several collections:

As part of the export of a 3D grid and related properties

As part of properties used as input to inplace calculations

As part of input to seismic forward modelling
Not thinking properly about this one would set these exports as three different scripts utilizing ExportData, should we then allow for checking that the object is exported already and then just writing to the connected metadata file? Or do we imagine people thinking this through upfront, and then adding all these x number of collection names in one script. I foresee that this will not scale all that well...

The alternative is that one exports this grid property three times with different collection name, or something similar..

Yes, I agree. This is conceptually hard to see a smooth solution to, given current constraints etc.

Multiple pointers to the same data object probably creates a HUGE overhead in bookkeeping (gut-feeling). Multiple uploads of the same object is an option, but doesn't feel good.

A possible option is that collections are pre-defined, and that export scripts point to one or more specific collections when exporting. In the metadata, they are then explicitly listed. But this does not solve how this is to be used by a client. Then the client must also know the context of each collection, etc.

The fundamental need here was to be able to link objects together, e.g. "as a consumer of FMU results, I would like to be able to find connected data, so that I can co-visualize connected data in multiple contexts." or something like that. The collections-concept may not be the best way to handle it. The current solution is using name which is not a good solution either.

anders-kiaer mentioned this issue Nov 16, 2023

Metadata on fip.yml and .lyr #392

Open

perolavsvendsen mentioned this issue Nov 17, 2023

Agree and describe a generalized pattern for export from FMU with fmu-dataio #395

Open

perolavsvendsen self-assigned this Nov 18, 2023

perolavsvendsen mentioned this issue Nov 18, 2023

Support data objects belong to a collection #396

Closed

2 tasks

perolavsvendsen added enhancement New feature or request Data definitions Issues related to data definitions labels Nov 18, 2023

perolavsvendsen linked a pull request Nov 18, 2023 that will close this issue

Support data objects belong to a collection #396

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support data object is member of a collection #394

Support data object is member of a collection #394

perolavsvendsen commented Nov 15, 2023 •

edited

Loading

perolavsvendsen commented Nov 18, 2023

perolavsvendsen commented Nov 18, 2023

perolavsvendsen commented Nov 18, 2023

perolavsvendsen commented Nov 21, 2023

perolavsvendsen commented Nov 21, 2023

jcrivenaes commented Nov 21, 2023

perolavsvendsen commented Nov 21, 2023

perolavsvendsen commented Nov 21, 2023

jcrivenaes commented Nov 23, 2023

perolavsvendsen commented Nov 23, 2023

perolavsvendsen commented Nov 23, 2023

perolavsvendsen commented Nov 29, 2023

Support data object is member of a collection #394

Support data object is member of a collection #394

Comments

perolavsvendsen commented Nov 15, 2023 • edited Loading

perolavsvendsen commented Nov 18, 2023

perolavsvendsen commented Nov 18, 2023

perolavsvendsen commented Nov 18, 2023

perolavsvendsen commented Nov 21, 2023

perolavsvendsen commented Nov 21, 2023

jcrivenaes commented Nov 21, 2023

perolavsvendsen commented Nov 21, 2023

perolavsvendsen commented Nov 21, 2023

jcrivenaes commented Nov 23, 2023

perolavsvendsen commented Nov 23, 2023

perolavsvendsen commented Nov 23, 2023

perolavsvendsen commented Nov 29, 2023

perolavsvendsen commented Nov 15, 2023 •

edited

Loading