Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agree and describe a generalized pattern for export from FMU with fmu-dataio #395

Open
perolavsvendsen opened this issue Nov 17, 2023 · 5 comments

Comments

@perolavsvendsen
Copy link
Member

perolavsvendsen commented Nov 17, 2023

Given some time of development and usage, and given that there are different approaches to development of new things. It may be smart and useful to put together a generalized "guideline", or set of principles, for how what patterns we want to have.

The intention is to provide clear guidelines in the documentation, and that we strive to follow these pattern both in development and in implementation (in various FMU workflows). The intention is to avoid situations where fmu-dataio is used differently across all implementations.

The general pattern today looks something like this:

  1. Need for export of MyData with context.

  2. Represent MyData as a Python object OBJ of type TYPE.

    • This can be objects prepared by dependencies, e.g. pandas.DataFrame or xtgeo.RegularSurface, or
    • it can be objects prepared by fmu-dataio.
  3. fmu-dataio adds context and export OBJ.

  4. ExportData is initalized, and some context is provided as arguments.

  5. Context is extracted from surroundings (file paths, system information, etc)

  6. OBJ is passed to ExportData.export(). More context can be provided as arguments. TYPE is known to fmu-dataio, and more context is extracted from the object itself.

Example pseudo-script, Python object from other packages

from fmu.dataio import ExportData
import pandas as pd

# Represent data object(s) as a Python class
df = make_my_df()

# Do the export
cfg = get_config()
exp = ExportData(cfg, mycontext)
exp.export(df)

>>> file(s) with metadata

Example pseudo-script, Python object defined in fmu-dataio

from fmu.dataio import ExportData, SpecialObject # data object class

# Represent data object(s) as a Python class
myobject = SpecialObject(args)

# Do the export
cfg = get_config()
exp = ExportData(cfg, mycontext)
exp.export(myobject)

>>> file(s) with metadata

Main point: Pattern is the same regardless of where the data-representation class lives.

This pattern can possibly also help with creating more product-oriented data export, and possibly help with creating convenience wrappers for "well known" data in FMU.

@jcrivenaes
Copy link
Collaborator

What is RmsGrid?

@perolavsvendsen
Copy link
Member Author

perolavsvendsen commented Nov 17, 2023

What is RmsGrid?

Doesn't exist. Second thought: I removed it. The example above show the same, so it was redundant.

@daniel-sol
Copy link
Contributor

I do agree that we should not introduce very different patterns, but I feel that this needs a little more of context:
@perolavsvendsen's comments are coming from a discussion around a user story where we have a set of objects that all have some commonality, and we want that expressed in metadata for easy retrieval of all these objects, or that if you have one object you should immediately know that there are other objects that have this commonality:

E.g:

  • One 3d grid and all its related grid properties
  • A set of surfaces all used in the building of a 3d grid
  • A set of objects all being expressions of the volumes contained in a 3d grid

Here we have two options:

  1. To make classes that then under the hood do all the exporting that just uses dataio, basically under the hood.
    these don't even need to be owned by dataio
  2. To retain the pattern described above, and then send what could be a mix of objects into dataio for exporting

Where I do agree with @perolavsvendsen is that we should not open up for that every single new class developed for this should need to double up with all the arguments that are contained within the export method of ExportData. But I do see some challenges:

  • There will be introduced circularity in the code:
    e.g: from outside
from fmu.dataio import ExportData, SpecialObject # data object class

# Represent data object(s) as a Python class
myobject = SpecialObject(args)

# Do the export
cfg = get_config()
exp = ExportData(cfg, mycontext)
exp.export(myobject)

But on the inside:

class ExportData:
...
def export(obj, ..):
   if isinstance(obj, SpecialObject):
        for obj_ref in SpecialObject.object_references:
            if method_for_guessing_object(obj_ref) == "surface":
                 obj = method_for_reading_surface(obj_ref)
                 spexd = ExportData(obj, config, ..)
                 spexd.export()
         

or similar. Hope people get my drift..
Ideally I would hope that all the metadata options are automatically determined by the class, and therefore this would not be such a challenge, this would also give more controll and less chance of user error. But there will exceptions, like access, which one must be able to set on each group of objects, at the very least. I

@perolavsvendsen
Copy link
Member Author

Yes, there are some assumptions in my proposal that could be a bit off. The recursive use of ExportData should hopefully not be necessary, but I see your point. And could be that this is what would happen...

What I worry about is the duplication of code and input arguments that would be necessary when exposing > 1 routes all the way out to end user, in addition to the intuitiveness of the package which could easily (?) be broken if we do that. But these are assumptions...

Btw, ref

where we have a set of objects that all have some commonality, and we want that expressed in metadata for easy retrieval of all these objects

...this will hopefully be a good first step (not related to this discussion, directly)

@perolavsvendsen
Copy link
Member Author

Discussion 10/4-2024 @HansKallekleiv @tnatt @jcrivenaes

Current (initialize class, outside loop):

from fmu.dataio import ExportData
exp = ExportData()
for x in xx:
    exp.export(mydata)

Alternative B - initialize class, but inside loop

from fmu.dataio import ExportData
for x in xx:
    exp = ExportData()
    exp.export()

Alternative C - export (wrapper) functions:

from fmu.dataio import export_data
export_data(mydata)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants