Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to customize per-server filename and directory #7

Open
suvarchal opened this issue Dec 14, 2022 · 5 comments
Open

how to customize per-server filename and directory #7

suvarchal opened this issue Dec 14, 2022 · 5 comments

Comments

@suvarchal
Copy link

suvarchal commented Dec 14, 2022

currently using a action

          sinks :
            - type : file
              append : true
              per-server : true
              path : ocean-output-field.grib

outputs file with a prefix like multio-hostname-pid-ocean-output-field.grib is there a way to customize it? specify a sub-directory to output to or have a server-id (by count) in prefix. eg.,server-id/ocean-output-field.grib

@suvarchal suvarchal changed the title how to customize per-server filename how to customize per-server filename and directory Dec 14, 2022
@dsarmany
Copy link
Collaborator

This is currently not customisable. In general, this simple file sink is only meant for development and small-scale testing. Otherwise, something like the FDB (https://github.com/ecmwf/fdb) should be used as a sink.

The main difficulty with the specific request for server-id/ocean-output-field.grib is that an individual action does not have the knowledge of the number of servers. This is by design. Pipelines can be constructed independently of any client-server interaction.

However, the user may have the knowledge of the overall run topology and can create a templated configuration like this:

sinks :
   - type : file
      append : true
      per-server : false
      path : server-${id}/ocean-output-field.grib

With the per-server option turned off, there is no automatic prefixing and it can now be done in setting the path.

It would probably be a sensible improvement to make the prefixing more configurable. So instead of turning it on and off, we could chose from {none, per-host, per-server} as prefix to the pathname. Would that be useful?

@suvarchal
Copy link
Author

This is currently not customisable. In general, this simple file sink is only meant for development and small-scale testing. Otherwise, something like the FDB (https://github.com/ecmwf/fdb) should be used as a sink.

I see a good potential for file sink not just for development/debugging but also as a template for adding a new io-backend. For instance, writing using simple encode:raw makes it close to pure Zarr format (just need to add some other metadata and some directory structure) and of course grib file can also be wrapped with some metadata and be called Zarr, other example could be encode:netcdf. And it all comes together as attractive option for potential model/user to just use multio and get all the io-backends for free. Hence i thought it was worth a issue/feature-request subtly also touching ability to create a directory when using / in the path.

The main difficulty with the specific request for server-id/ocean-output-field.grib is that an individual action does not have the knowledge of the number of servers. This is by design. Pipelines can be constructed independently of any client-server interaction.

I understand this design choice for this action in clients, but if the action is in server, i assume each server knows how many other servers exist? as they can together gather to write a single file? (I am a rookie, it is very possible i don't understand the pipelines very well)

However, the user may have the knowledge of the overall run topology and can create a templated configuration like this:

sinks :
   - type : file
      append : true
      per-server : false
      path : server-${id}/ocean-output-field.grib

With the per-server option turned off, there is no automatic prefixing and it can now be done in setting the path.

So correct me if i understand this, so the way to use such template would be to read the action yaml in client-model or server where the relevant action is and replace the variable on the fly? (is there an example in tests that does that?) another question is by using / i meant to place in a directory, is it already supported (because i can't use a path like path: temp/outfile.grib in my version) or that should be part of initializing configuration on client/server end.

It would probably be a sensible improvement to make the prefixing more configurable. So instead of turning it on and off, we could chose from {none, per-host, per-server} as prefix to the pathname. Would that be useful?

While this makes sense (and i understand need to prefix when it is file per server as to avoid replacing/ race), what about using the prefix as directory name instead of file so in example in OP would be path:hostname/pid/ocean-output-field.grib.

btw you mentioned per-host, is it in plans to also allow file per-host in future?

@tlmquintino
Copy link
Member

tlmquintino commented Dec 15, 2022

@suvarchal if you would like to create a Zarr writer, then I would suggest to make a separate sink, which would then have the factory name "zarr"
We should keep the "file" simple. I can imagine Zarr will have much more config options that make little sense to "just a file dump" approach of the "file" sink.

Of course my comment is independent of the suggestion to customise the path for the "file" sink. That is still a valid and useful contribution.

@suvarchal
Copy link
Author

@suvarchal if you would like to create a Zarr writer, then I would suggest to make a separate sink, which would then have the factory name "zarr" We should keep the "file" simple. I can imagine Zarr will have much more config options that make little sense to "just a file dump" approach of the "file" sink.

I guess eventually i will get there to be able to write a zarr sink or a sink that has our binary restarts :) right now it is more like testing waters. But for that i imagine more then config options, i need a write field for strings and integers as it seems currently only double data is used:

int multio_write_field(multio_handle_t* mio, multio_metadata_t* md, const double* data, int size) {
, so I could probably add another function for char or may be just void* and see.

@suvarchal
Copy link
Author

one other use case for customizing file-prefix is by field name or category say path:$name-ocean-output.grib. But can't imagine how one could write a yaml for that can automatically write separate file for each field.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants