Home

geoflow – R engine to orchestrate and run geospatial (meta)data workflows

R engine to orchestrate and run geospatial (meta)data workflows

If you wish to sponsor geoflow, do not hesitate to contact me

Many thanks to the following organizations that have provided fundings for strenghtening the geoflow package:

Table of contents

1. Overview
2. Package status
3. Credits
4. User guide
   4.1 How to install geoflow
   4.2 How to use geoflow
   4.3 Description of a geoflow configuration
      4.3.1 Overall structure
      4.3.2 Configuration components
   4.4 How to create a geoflow configuration file
      4.4.1 Create manually a configuration file
      4.4.2 Use the configuration Shiny User Interface
5. Issue reporting

1. Overview and vision

The principle of geoflow is to offer a simple framework in R to execute and orchestrate geospatial (meta)data management and publication tasks in an automated way.

2. Development status

On GitHub under consolidation.

First version in CRAN expected end 2019.

3. Credits

Package distributed under MIT license.

If you use geoflow, w would be very grateful if you can add a citation in your published work. By citing geoflow, beyond acknowledging the work, you contribute to make it more visible and guarantee its growing and sustainability. For citation, please use the DOI:

4. User guide

4.1 How to install geoflow

For now, the package can be installed from Github

install.packages("devtools")

Once the devtools package loaded, you can use the install_github to install geoflow. By default, package will be installed from master which is the current version in development (likely to be unstable).

require("remotes")
install_github("eblondel/geoflow")

4.2 How to use geoflow in R

In R, using geoflow consists essentially in running the function ``executeWorkflow", which takes a single parameter: the name of a configuration file in JSON format:

executeWorkflow("config.json")

The workflow that is going to be executed is entirely described in a configuration file. The main preparatory work of the data manager will then to prepare the configuration file, depending on the tasks to perform.

Note: It is planned to offer a shiny app interface, through geoflow, that will allow configure the workflow in a user-friendly manner (The shiny app will then take care of creating the appropriate JSON configuration file in a transparent way)

4.3 Description of a geoflow configuration

Before creating a configuration file first let's describe how the geoflow is structured and what are the key concepts.

4.3.1 Overall structure

A geoflow configuration contains several parts (some that are optional) that are defined here below.

Name	Definition	Optional/Required
id	A string identifier/name for the workflow	Required
mode	A string, either 'raw' or 'entity' that defines the workflow mode: * raw mode: simple mode that allows to trigger basic tasks with R (known in geoflow as actions) in sequential way. This mode can be used by users that just want to chain R scripts. * entity mode: mode were all the actions will be performed based on a set of entities. In geoflow, an `entity` includes both metadata and data elements. In most of cases, an entity will describe a dataset for which we want to perform actions such as metadata handling/publishing in a web metadata catalogue, spatialdata upload in Geoserver, etc etc. With this mode, `geoflow` will take each entity for which a set of actions will be executed.	Required
metadata	Part where the entity set is defined, to be used for executing actions in mode entity.	Required with entity mode
software	Part where the software to interact with will be defined. It can be a software from where the user wants to get data, or a software where to publish data using geoflow e.g. a GeoNetwork metadata catalogue, a GeoServer, etc.	Optional
actions	Part where the actions to use are defined. These can be source R scripts in case of the raw mode, or entity-based actions in case of mode entity. An action put in the list can be enabled/disabled and parameterized with a set of options that is specific to each action.	Required
profile	Global metadata workflow. Information that is common to all entities in case of mode entity, and that can be exploited in some of the actions. e.g. add a project logo for all dataset descriptions.	Optional
options	Global workflow options	Optional

4.3.2 Configuration components

id

This is a just simple string that identifies the user workflow. This string will be referenced in the logs of each workflow execution, and can be useful in case the user handles multiples flows with different configurations (e.g. one workflow per project).

mode

At its earliest stage, geoflow was designed to chain a set of processings handled by different scripts. This is known as raw mode, where the user just wants to use geoflow to chain some tasks with a set of R scripts.

In order to facilitate the management of datasets within Spatial Data Infrastructures (SDI), including their processing, publication and description with proper metadata, a new mode called entity was introduced. The concept of entity refers to the description of a dataset or subset of it for which the user wants to perform actions. In this mode, each action defined in geoflow will be executed for each entity of the list of entities that will be defined in the metadata configuration part.

metadata

The metadata part is the section of the workflow configuration where to define the sources of metadata content. Such content is split into two categories:

entities: source for the list of entities, where each entity represents the metadata
contacts: source for the directory of contacts, referenced with roles in the dataset metadata

Whether is for entities or contacts, the configuration consists in declaring the source (a file or URL) and the handler, ie the source format.

The sources of entities and contacts can be handled for different handlers (for the timebeing gsheet - for Google spreadsheets, csv or excel files). The list of entity and contact handlers can be retrieved in R with list_entity_handlers() and list_contact_handlers(). For the time being, geoflow provides basic format handlers, but the list of handlers is expected to be extended, eg LDAP handler for contacts.

JSON snippet for entities:

    "entities": {
      "handler": "gsheet",
      "source": "https://docs.google.com/spreadsheets/d/1iG7i3CE0W9zVM3QxWfCjoYbqj1dQvKsMnER6kqwDiqM/edit?usp=sharing"
    }

JSON snippet for contacts:

    "contacts" : {
      "handler": "gsheet",
      "source": "https://docs.google.com/spreadsheets/d/144NmGsikdIRE578IN0McK9uZEUHZdBuZcGy1pJS6nAg/edit?usp=sharing"
    }

JSON snippet for the metadata part (including entities and contacts)

  "metadata": {
    "entities": {
      "handler": "gsheet",
      "source": "https://docs.google.com/spreadsheets/d/1iG7i3CE0W9zVM3QxWfCjoYbqj1dQvKsMnER6kqwDiqM/edit?usp=sharing"
    },
    "contacts" : {
      "handler": "gsheet",
      "source": "https://docs.google.com/spreadsheets/d/144NmGsikdIRE578IN0McK9uZEUHZdBuZcGy1pJS6nAg/edit?usp=sharing"
    }
  }

It is possible to use a custom handler function provided by user. For this, the handler should be the name of the R function to be provided by an R script. The R script must be defined in a extra property named "script". In this configuration, the source property becomes optional (it could be hardcoded in the user's handler).

JSON snippet for custom contact LDAP handler:

    "contacts" : {
      "handler": "my_ldap_function_to_load_contacts",
      "source": "my_ldap_endpoint",
      "script": "my_ldap_script.R"
    }

software

The software part of the configuration consists in listing the pieces of software needed for the workflow. Since it is a list of software, the base JSON definition will be an array (using square brackets):

"software": [
     <here will be listed the pieces of software>
]

List of software managed by geoflow

By default geoflow manages specific software to interact with. These software are essentially R interfaces to web-applications / software APIs. The list of software managed by geoflow can be retrieved in R with list_software():

software_type	definition
csw	OGC Catalogue Service for the Web (CSW) client powered by 'ows4R' package
wfs	OGC Web Feature Service (WFS) client powered by 'ows4R' package
geonetwork	GeoNetwork API Client, powered by 'geonapi' package
geoserver	GeoServer REST API Client, powered by 'geosapi' package
zenodo	Zenodo client powered by 'zen4R' package

How to configure a software

To configure a piece of software, the latter should be provided with various elements:

an id: it should be a user string id to identify the software in question.
a type: string, either "input" (software to use as source, to fetch data) or "output" (software to use as target, to publish/manage data)
a software_type: a string identifying the software types as managed by geoflow (see above table). For example, to declare a GeoServer software, the software_type "geoserver" will be used.
a set of parameters: that depend on the type of software configured. For the software managed by geoflow, it is possible to interrogate geoflow to know which parameters are needed given a software_type by doing in R list_software_parameters(<software_type>) (e.g. for GeoServer type in R list_software_parameters("geoserver")
a set of properties: that depend on the type of software configured. Those are extra configuration elements to use with the software considered. For the software managed by geoflow, it is possible to interrogate geoflow to know which properties can be used fiven a software_type by doint in R list_software_properties(<software_type>) (e.g. for Geoserver type in R list_software_properties("geoserver"))

Let's look at a geoserver software declaration:

list_software_parameters("geoserver")

These are the parameters that need to be declared in order to interact with the Geoserver software:

name	definition
url	GeoServer application URL
user	Username for GeoServer authentication
pwd	Password for GeoServer authentication
logger	Level for 'geosapi' logger messages (NULL, 'INFO' or 'DEBUG')

list_software_properties("geoserver")

These are the propertiesthat need to be declared in order to configure the publication with Geoserver:

name	definition
workspace	GeoServer workspace name
datastore	GeoServer datastore name

JSON snippet for declaring a GeoServerfor publishing purpose:

       {
		"id": "my-geoserver",
		"type": "output",
		"software_type": "geoserver",
		"parameters": {
			"url": "http://localhost:800/geoserver",
			"user": "admin",
			"pwd": "geoserver",
			"logger": "DEBUG"
		},
		"properties" : {
			"workspace": "my_geoserver_workspace",
			"datastore": "my_geoserver_datastore"
		}
	}

How to use a user's custom software

TODO

actions

TODO

profile

The profile is a part where global workflow metadata can be defined. For the time-being, this is essentially a placeholder. In the future this section may be further enriched with metadata elements that can be shared globally to all entities managed and actions applied on them.

It is already possible to define one or more logo URLs to be shared in actions such as geometa_create_iso_19115.

JSON snippet of profile:

  "profile": {
	"project": "Test geoflow project",
	"organization": "My organization",
	"logos": [
		"https://via.placeholder.com/300x150.png/09f/fff?text=geometa",
		"https://via.placeholder.com/300x150.png/09f/fff?text=ows4R"
	]
  }

options

The options is by definition optional. The table below defines the possible geoflow global options:

Name	Definition	Default value
`line_separator`	Defines the suite of characters used for splitting metadata components with a single tabular cell of an entity (eg. Description field)	;\n (likely to be modified for the 1st geoflow release)

4.4 How to create a geoflow configuration file

4.4.1 Create manually a configuration file

If we take the different blocks that define the structure of a geoflow configuration (as introduced in section 4.3), the skeleton of the JSON configuration file will be then as follows:

{
  "id": "my-workflow",
  "mode": "entity",
  "metadata": { <metadata sources defined here> },
  "software": [ <pieces of software defined here> ],
  "actions": [ <actions defined here>  ],
  "profile": { <global profile (metadata) defined here> },
  "options": { <global options defined here> },
}

TODO

4.4.2 Use the geoflow configuration Shiny User Interface

NOT YET AVAILABLE

5. Issue reporting

Issues can be reported at https://github.com/eblondel/geoflow/issues

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

geoflow – R engine to orchestrate and run geospatial (meta)data workflows

1. Overview and vision

2. Development status

3. Credits

4. User guide

4.1 How to install geoflow

4.2 How to use geoflow in R

4.3 Description of a geoflow configuration

4.3.1 Overall structure

4.3.2 Configuration components

id

mode

metadata

software

actions

profile

options

4.4 How to create a geoflow configuration file

4.4.1 Create manually a configuration file

4.4.2 Use the geoflow configuration Shiny User Interface

5. Issue reporting

Clone this wiki locally