Skip to content

ojalaquellueva/gvs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

83 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Geocordinate Validation Service (GVS)

Author: Brad Boyle ([email protected])

Contents

Overview

The Geocoordinate Validation Service (GVS) performs quality checks on decimal geocoordinates (georeferenced points represented as pairs of decimal latitude and longitude values). In addition to detecting common georeferencing errors, the GVS calculates the probability that a given point represents a political division centroid, as opposed to a directly measured point on the Earth's surface. The GVS also returns the country, state- and country-level political divisions in which the point is located. Political divisions are determined with reference to the GADM database of world administrative divisions (Global Admininstrative Divisions; https://gadm.org). For an online interface to the GVS, see https://gvs.biendata.org/.

Information returned by the GVS includes:

  • Estimate of inherent precision implied by the number of decimals places in the original coordinates
  • Brief descriptions of the types errors detected (e.g., "Coordinates our of range")
  • Flagging of points in the ocean
  • Names and GADM identifiers of the admin_0, admin_1 and admin_2 political divisions (e.g., country, state, county) in which a point is located
  • Absolute and relative distance to the centroid of each political division (see full list of output fields below)
  • Probability that the point is a centroid, and, if applicable, the type of centroid and political division (country, state or county) of the likeliest centroid.
  • Flagging of points with a high probability of being a centroid, above a certain threshold (threshold can be adjusted by user)

This service may be used in combination with the BIEN Geographic Name Resolution Service (GNRS; https://github.com/ojalaquellueva/gnrs.git) to perform "political geovalidation" of georeferenced biodiversity observations. Political geovalidation checks if all detected political divisions (i.e., the country, state and county polygons in which the coordinates are located) match the declared political divisions (country, state and county names) of the original observation record. Operationally, this validation can be performed by checking that the GADM administrative division identifiers returned by the GVS match the GADM identifiers returned by the GNRS.

GVS, CDS...what's the difference?

The GVS was previously developed under the name CVS (Centroid Detection Service) as an application for the detection of political division centroids. It has been renamed to reflect the wider range of features currently available.

Installation and configuration

Software

Ubuntu 16.04 or higher
PostgreSQL/psql 10 or higher (PostGIS extension installed by this script)

Dependencies

Requires access to the GNRS (https://github.com/ojalaquellueva/gnrs.git) either as API or local batch service. Version in this repository uses the BIEN GNRS API (http://vegbiendev.nceas.ucsb.edu:8875/gnrs_ws.php).

Permissions

This script must be run by a user with authorization to connect to postgres (as specified in pg_hba file). The admin-level and read-only Postgres users for the gadm database (specified in params.sh) should already exist and must be authorized to connect to postgres (as specified in pg_hba file).

Recommended setup

# Create application base directory
mkdir -p gvs
cd gvs

# Create application code directory
mkdir src

# Install application code from repository
cd src
git clone https://github.com/ojalaquellueva/gvs

# Move data and sensitive parameters directories outside of code directory
mv data ../
mv config ../

Installation notes

  • After moving the config directory:
    • Rename all config files by removing "example" from file names
    • Set all passwords, paths and other parameter values in the config files
  • The temporary application data directory /tmp/gvs is now installed on the fly by the application. You no longer need to install it manually.

Input data

Raw data

Raw data for the CSV is one or more pairs of coordinates in decimal format, separated by a single comma, with latitude first. E.g.,

latitude,longitude
36.580435,-96.53331
39.8081822436996,-91.6228915663878
46.0,25.0
52.92755,4.7864
-23.62,-65.43
-29.178651024973867,149.269218
-29.231478025060987,152.13519
51.81171,-3.8879

Format

Data are submitted to the GVS via the shell command line as a CSV (comma delimitted) text file, formatted as above in Input. Data submitted via the API or GVS R package must be converted to JSON and attached to the body of a POST request (see API documentation in this repository, and the separate RCDS (=GVS) repository https://github.com/EnquistLab/RCDS).

Output

Data dictionary

Field name Meaning Data type Constrained values Can be NULL? Notes
id Unique identifier Integer No Assigned by GVS
latlong_verbatim Coordinates submitted Text Coordinate pair exactly as submitted
latitude_verbatim Latitude submitted Text Latitude portion only
longitude_verbatim Longitude submitted Text Latitude portion only
latitude Latitude extracted from input Decimal Decimal latitude to original number of decimal places
longitude Longitude extracted from input Decimal Decimal longitude to original number of decimal places
country Country in which point located Text
state State/province in which point located Text
county County/parish in which point located Text
gid_0 GADM identier of country Text
gid_1 GADM identier of state/province Text
gid_2 GADM identier of county/parish Text
country_cent_dist Distance in km to country centroid Decimal
country_cent_dist_relative Relative distance to country centroid Decimal country_cent_dist / country_cent_dist_max
country_cent_type Type of centroid Text bb, bb_main, pos, pos_main, std, std_main* Yes
country_cent_dist_max Maximum distance to centroid within country Decimal Distance from centroid to farthest point along political division perimeter.
is_country_centroid Is point likely a country centroid? Integer 1 Yes Equals 1 (yes) if country_cent_dist_relative < max_dist_rel
state_cent_dist Distance in km to state centroid Decimal
state_cent_dist_relative Relative distance to state centroid Decimal state_cent_dist / state_cent_dist_max
state_cent_type Type of centroid Text bb, bb_main, pos, pos_main, std, std_main* Yes
state_cent_dist_max Maximum possible distance to centroid within state Decimal Distance from centroid to farthest point along political division perimeter.
is_state_centroid Is point likely a state centroid? Integer 1 Yes Equals 1 (yes) if state_cent_dist_relative < max_dist_rel
county_cent_dist Distance in km to county centroid Decimal
county_cent_dist_relative Relative distance to county centroid Decimal county_cent_dist / county_cent_dist_max
county_cent_type Type of centroid Text bb, bb_main, pos, pos_main, std, std_main* Yes
county_cent_dist_max Maximum possible distance to centroid within county Decimal Distance from centroid to farthest point along political division perimeter.
is_county_centroid Is point likely a county centroid? Integer 1 Yes Equals 1 (yes) if county_cent_dist_relative < max_dist_rel
subpoly_cent_dist Distance in km to country subpolygon centroid Decimal Smallest distance to a spatially separate country subpolygon, such as an offshore island.
subpoly_cent_dist_relative Relative distance to country subpolygon centroid Decimal subpoly_cent_dist / subpoly_cent_dist_max
subpoly_cent_type Type of centroid Text bb, pos, std* Yes
subpoly_cent_dist_max Maximum distance to centroid within country subpolygon Decimal Distance from centroid to farthest point along subpolygon perimeter.
is_subpoly_centroid Is point likely a country subpolygon centroid? Integer 1 Yes Equals 1 (yes) if subpoly_cent_dist_relative < max_dist_rel
centroid_dist_km Distance in km to consensus centroid, if any Decimal This and the other "centroid_" fields only populated if country, state, county or subpolygon is flagged as a likely centroid.
centroid_dist_relative Relative distance to consensus centroid Decimal
centroid_type Type of centroid Text bb, bb_main, pos, pos_main, std, std_main* Yes
centroid_dist_max_km Maximum distance to centroid within political division of consensus centroid Decimal
centroid_poldiv Most likely (consensus) centroid, if any Text country, county, state, other Yes other=separate subpolygon other than a political division (e.g., island)
max_dist_rel Maximum relative distance threshold Decimal No Parameter, can be set by user. Default=0.002
latlong_err Points out of range or in ocean Text "Coordinates non-numeric", "Coordinate values out of bounds", "One or more missing coordinates", "In ocean" Yes NULL if no errors detected
coordinate_decimal_places Smallest number of decimal places detected in verbatim latitude and longitude Integer Up to 14 decimal places detected
coordinate_inherent_uncertainty_m Inherent uncertainty in km due to decimal places used Decimal Difference in radius between the smallest and largest circles centered on point and consistent with decimal places used.
user_id Optional user-supplied identifier Text May be any value or none. Not used by GVS.

* See Constrained Values

Constrained values

Category Value Meaning Notes
Centroid type bb Bounding box
Centroid type bb_main Bounding box, largest subpolygon Same as bb if only one polygon
Centroid type pos Point on surface Centroid guaranteed inside perimeter for irregularly-shaped polygons
Centroid type pos_main Point on surface, largest subpolygon Same as std if only one polygon
Centroid type std Standard centroid-of-mass May fall outside perimeter of irregularly-shaped polygons
Centroid type std_main Standard centroid-of-mass, largest subpolygon Same as std if only one polygon
latlong_error Coordinates non-numeric Latitude or longitude not a decimal number
latlong_error Coordinate values out of bounds Latitude out of range [-90:90] or longitude out of range [-180:180]
latlong_error One or more missing coordinates Latitude or longitude or both are missing
latlong_error In ocean Point in ocean

Usage

Build the GVS Database

See README in gvs_db/.

GVS batch application

  • Processes file of geocoordinates in single batch

Syntax

./gvs.sh -f <input_filename_and_path> [other options]

Options

Option Meaning Required? Default value
-f Input file and path Yes
-o Output file and path No [input_file_name]_gvs_results.csv
-s Silent mode No Verbose/interactive mode by default
-m Send notification message at start and completion, or on fail No (must be followed by valid email if included)

Example:

./gvs.sh -f myfile.csv -m [email protected]

GVS parallel processing application

  • Processes file of geocoordinates in parallel mode (multiple batches)
  • If you get a permission error, try running as sudo

Syntax

./gvspar.pl -in <input_filename_and_path> -out <output_filename_and_path> -nbatch <batches> -opt <makeflow_options>

Options

Option Meaning Required? Default value
-in Input file and path Yes
-out Output file and path No [input_file_name]_gvs_results.csv
-nbatch Number of batches Yes
-opt Makeflow options No

Example:

  • On some operating system configurations you may need to run using sudo to enable access to temp folder /tmp/gvs, especially if this directory doesn't exist (in which case, the application will attempt to create it). Test first without sudo.
./gvspar.pl -in "data/gvs_testfile.csv" -nbatch 3

GVS interfaces

GVS API

General documentation

See https://github.com/ojalaquellueva/gvs/tree/master/api#readme.

Example API usage in R

https://github.com/ojalaquellueva/gvs/blob/master/api/example_scripts/gvs_api_example.R`

GVS R package

https://github.com/EnquistLab/RCDS

  • Note: The GVS R package is currently called "RCDS"

GVS graphical user interface

Web: https://gvs.biendata.org`
Repository: https://github.com/EnquistLab/GVSweb

About

Geocoordinate Validation Service

Resources

Stars

Watchers

Forks

Packages

No packages published