Skip to content

Latest commit

 

History

History
132 lines (91 loc) · 11.6 KB

data-storage.md

File metadata and controls

132 lines (91 loc) · 11.6 KB

Data Storage

Layman stores several types of data in several stores.

Types of data:

Data stores:

Types of Data

Users

Information about users includes their names, contacts, and authentication credentials.

When user reserves his username, names, contacts and other relevant metadata are obtained from authorization provider and saved to Redis, PostgreSQL, and GeoServer. User's personal workspace is created too.

Layers

Information about layers includes vector or raster data and visualization.

When user publishes new layer

Subsequently, asynchronous tasks ensure following steps:

  • data file chunks and completed data files are saved to filesystem (if sent asynchronously)
  • vector data files are imported to PostgreSQL
    • files with invalid byte sequence are first converted to GeoJSON, then cleaned with iconv, and finally imported to database.
    • PostgreSQL table with vector data is registered to GeoServer
  • raster files are normalized and compressed to BigTIFF GeoTIFF with overviews (pyramids)
    • normalized GeoTIFF is registered to GeoServer
  • SLD file is saved to GeoServer and registered to WMS layer
  • QGS file is created on filesystem and through QGIS server registered to GeoServer
  • access rights are synchronized to GeoServer
  • thumbnail file is saved to filesystem
  • metadata record is saved to PostgreSQL using Micka's CSW

When user patches existing layer, data is saved in the same way.

Maps

Information about maps includes JSON definition.

When user publishes new map

Subsequently, when asynchronous tasks run,

When user patches existing map, data is saved in the same way.

Tasks

Information about asynchronous tasks consists of few parameters necessary for Celery task runner. In case of publishing or patching layer or map, it includes e.g. task name, owner name, layer/map name, and additional parameters derived from HTTP POST/PATCH parameters.

Task information are saved to Redis only.

Data version

Information about data version including migration ID is stored in PostgreSQL.

Stores

Redis

Data is saved in LAYMAN_REDIS_URL database. Keys are prefixed with

  • Layman python module name that saved the data, followed by :, e.g. layman.layer.geoserver: or layman:
  • other strings, e.g. celery, _kombu, or unacked in case of Celery task data.

Redis is used as temporary data store. When Layman stops, data persists in Redis, however on each startup Layman flushes the Redis database and imports user-related data and publication-related data from filesystem. It means that any task-related data is lost on startup. This can be controlled by LAYMAN_SKIP_REDIS_LOADING.

Filesystem

Data is saved to LAYMAN_DATA_DIR directory, LAYMAN_QGIS_DATA_DIR directory, and GeoServer data directory.

Workspace directory is created in LAYMAN_DATA_DIR directory for every created workspace. Name of the workspace directory is the same as workspace name.

Publication directory is created inside workspace directory for each publication (e.g. map or layer) the user published. Name of the publication directory is the same as name of the publication (e.g. layername or mapname). Publication-related information is saved in publication directory.

QGIS workspace directory is created in LAYMAN_QGIS_DATA_DIR directory for every created workspace. Name of the workspace directory is the same as workspace name.

QGIS layer directory is created inside QGIS workspace directory for each layer with QGIS style the user published. Name of the publication directory is the same as name of the layer. QGS project with style definition is stored in this directory for WMS purpose.

Normalized raster directory named normalized_raster_data is created in GeoServer data directory.

Normalized raster workspace directory is created in Normalized raster directory for every workspace with at least one raster layer. Name of the workspace directory is the same as workspace name.

Normalized raster layer directory is created inside Normalized raster workspace directory for every raster layer. Name of the publication directory is the same as name of the layer. Normalized raster is stored in this directory for WMS purpose. In case of timeseries layer, additional files holding e.g. time_regex are created too.

Filesystem is used as persistent data store, so data survives Layman restart.

PostgreSQL

Layman uses directly one database specified by LAYMAN_PG_DBNAME to store data. There are two kinds of schemas in such database:

  • LAYMAN_PRIME_SCHEMA that holds information about
    • users, workspaces, and publications including access rights
    • data version including migration ID
  • Schemas holding vector layer data.
    • One workspace schema is created for every created workspace. Name of workspace schema is always the same as workspace name.
    • One table is created in workspace schema for each layer published with input vector files. Name of the table is in form layer_<UUID> with - replaced with _, e.g. layer_96b918c6_d88c_42d8_b999_f3992b826958. The table contains data from vector data files.

Second database is used by Micka to store metadata records. The database including its structure is completely managed by Micka. By default, it's named hsrs_micka6.

Other external databases can be used to publish vector data from PostGIS tables (see external_table_uri in POST Workspace Layers). Layman is able to change data in the table using WFS-T (including adding new columns) if provided DB user has sufficient privileges. Other management is left completely on admin of such DB.

Data changes made directly in vector data DB tables (both internal and external) are automatically propagated to WMS and WFS. However, layer thumbnail and bounding box at Layman are not automatically updated after such changes.

PostgreSQL is used as persistent data store, so data survives Layman restart.

GeoServer

User and role are created for every user who reserved username. User name on GeoServer is the same as username on Layman. Role name is composed a USER_<upper-cased username>.

Two workspaces are created, each with one PostgreSQL datastore, for every workspace (both personal and public). First workspace is meant for WFS and has the same name as the workspace on Layman. Second workspace is meant for WMS and is suffixed with _wms. Name of the datastore is postgresql for both workspaces. Every workspace-related information (including PostgreSQL datastore) is saved inside workspace.

For each vector layer from external PostGIS table, PostgreSQL datastore is created. Name of the data store is external_db_<layername>.

For each vector layer with SLD style, Feature Type and Layer are registered in both workspaces (WMS and WFS), and Style is created in WMS workspace. Names of these three models are the same as layername. Feature type points to appropriate PostgreSQL table through PostgreSQL datastore. Style contains visualization file.

For each vector layer with QML style, Feature Type is registered in WFS workspace, Cascading WMS Store and Cascading WMS Layer are created in WMS workspace. Names of Feature Type and Cascading WMS Layer are the same as layername, name of Cascading WMS Store is prefixed with qgis_. Feature type points to appropriate PostgreSQL table through PostgreSQL datastore. Cascading WMS Store and Layer cascades to the layer's WMS instance at QGIS server (pointing to QGS file of the layer).

For each raster layer, Coverage Store, Coverage, and Style are created in WMS workspace. If layer is timeseries, Coverage Store is ImageMosaic, otherwise it is GeoTIFF. Names of Coverage and Style are the same as layername, name of Coverage Store is prefixed with geotiff_ or image_mosaic_ depending on its type. Coverage Store and Coverage points to appropriate normalized raster GeoTIFF file(s). Style contains visualization file.

Two access rules are created for each layer in each GeoServer workspace (WFS and WMS), one for read access right, one for write access right. Every username from Layman's access right is represented by user's role name (i.e. USER_<upper-cased username>). Role EVERYONE is represented as ROLE_ANONYMOUS on GeoServer.

GeoServer is used as persistent data store, so data survives Layman restart.