Skip to content

Workflow configuration

Emmanuel Blondel edited this page Aug 23, 2019 · 19 revisions

geoflow relies on the definition of workflow which is handled by a single JSON file, that should be customized by the data manager according to his/her needs.

The configuration file contains several parts (some that are optional) that are defined here below.

Name Definition Optional/Required
id A string identifier/name for the workflow Required
mode A string, either 'entity' or 'raw' that defines the workflow mode. The mode raw is a simple mode that allows to trigger simple scripts (known in geoflow as actions) in sequential way. The mode entity is a mode were all the actions will be performed based on a set of entities, usually describing datasets for which we want to perform actions such as publishing, metadata handling, etc. Required
metadata Part that defines the reference entities used for executing actions in mode entity Required if mode entity
software Part where the software to interact with will be defined. It can be a software from where the user wants to get data, or a software where to publish data using geoflow e.g. a GeoNetwork metadata catalogue, a GeoServer, etc. Optional
actions Part where the actions to use are defined. These can be source R scripts in case of mode raw, or entity-based actions in case of mode entity. An action put in the list can be enabled/disabled and parameterized with a set of options that is specific to each action. Required
profile Global metadata workflow. Information that is common to all entities in case of mode entity, and that can be exploited in some of the actions. e.g. add a project logo for all dataset descriptions. Optional
options Global workflow options Optional

id

This is a just simple string that identifies the user workflow. This string will be referenced in the logs of each workflow execution, and can be useful in case the user handles multiples flows with different configurations (e.g. one workflow per project).

mode

At its earliest stage, geoflow was designed to chain a set of processings handled by different scripts. This is known as raw mode, where the user just wants to use geoflow to chain some tasks with a set of R scripts.

In order to facilitate the management of datasets within Spatial Data Infrastructures (SDI), including their processing, publication and description with proper metadata, a new mode called entity was introduced. The concept of entity refers to the description of a dataset or subset of it for which the user wants to perform actions. In this mode, each action defined in geoflow will be executed for each entity of the list of entities that will be defined in the metadata configuration part.

metadata

The metadata part is the section of the workflow configuration where to define the sources of metadata content. Such content is split into two categories:

  • entities: source for the list of entities, where each entity represents the metadata
  • contacts: source for the directory of contacts, referenced with roles in the dataset metadata

Whether is for entities or contacts, the configuration consists in declaring the source (a file or URL) and the handler, ie the source format.

The sources of entities and contacts can be handled for different handlers (for the timebeing gsheet - for Google spreadsheets, csv or excel files). The list of entity and contact handlers can be retrieved in R with list_entity_handlers() and list_contact_handlers(). For the time being, geoflow provides basic format handlers, but the list of handlers is expected to be extended, eg LDAP handler for contacts.

JSON snippet for entities:

    "entities": {
      "handler": "gsheet",
      "source": "https://docs.google.com/spreadsheets/d/1iG7i3CE0W9zVM3QxWfCjoYbqj1dQvKsMnER6kqwDiqM/edit?usp=sharing"
    }

JSON snippet for contacts:

    "contacts" : {
      "handler": "gsheet",
      "source": "https://docs.google.com/spreadsheets/d/144NmGsikdIRE578IN0McK9uZEUHZdBuZcGy1pJS6nAg/edit?usp=sharing"
    }

JSON snippet for the metadata part (including entities and contacts)

  "metadata": {
    "entities": {
      "handler": "gsheet",
      "source": "https://docs.google.com/spreadsheets/d/1iG7i3CE0W9zVM3QxWfCjoYbqj1dQvKsMnER6kqwDiqM/edit?usp=sharing"
    },
    "contacts" : {
      "handler": "gsheet",
      "source": "https://docs.google.com/spreadsheets/d/144NmGsikdIRE578IN0McK9uZEUHZdBuZcGy1pJS6nAg/edit?usp=sharing"
    }
  }

It is possible to use a custom handler function provided by user. For this, the handler should be the name of the R function to be provided by an R script. The R script must be defined in a extra property named "script". In this configuration, the source property becomes optional (it could be hardcoded in the user's handler).

JSON snippet for custom contact LDAP handler:

    "contacts" : {
      "handler": "my_ldap_function_to_load_contacts",
      "source": "my_ldap_endpoint",
      "script": "my_ldap_script.R"
    }

software

TODO

actions

TODO

profile

TODO

options

The options is by definition optional. The table below defines the possible geoflow global options:

Name Definition Default value
line_separator Defines the suite of characters used for splitting metadata components with a single tabular cell of an entity (eg. Description field) ";n" (likely to be modified for the 1st geoflow release)
Clone this wiki locally