Author: Brad Boyle ([email protected])
The Geocoordinate Validation Service (GVS) performs quality checks on decimal geocoordinates (georeferenced points represented as pairs of decimal latitude and longitude values). In addition to detecting common georeferencing errors, the GVS calculates the probability that a given point represents a political division centroid, as opposed to a directly measured point on the Earth's surface. The GVS also returns the country, state- and country-level political divisions in which the point is located. Political divisions are determined with reference to the GADM database of world administrative divisions (Global Admininstrative Divisions; https://gadm.org). For an online interface to the GVS, see https://gvs.biendata.org/.
Information returned by the GVS includes:
- Estimate of inherent precision implied by the number of decimals places in the original coordinates
- Brief descriptions of the types errors detected (e.g., "Coordinates our of range")
- Flagging of points in the ocean
- Names and GADM identifiers of the admin_0, admin_1 and admin_2 political divisions (e.g., country, state, county) in which a point is located
- Absolute and relative distance to the centroid of each political division (see full list of output fields below)
- Probability that the point is a centroid, and, if applicable, the type of centroid and political division (country, state or county) of the likeliest centroid.
- Flagging of points with a high probability of being a centroid, above a certain threshold (threshold can be adjusted by user)
This service may be used in combination with the BIEN Geographic Name Resolution Service (GNRS; https://github.com/ojalaquellueva/gnrs.git
) to perform "political geovalidation" of georeferenced biodiversity observations. Political geovalidation checks if all detected political divisions (i.e., the country, state and county polygons in which the coordinates are located) match the declared political divisions (country, state and county names) of the original observation record. Operationally, this validation can be performed by checking that the GADM administrative division identifiers returned by the GVS match the GADM identifiers returned by the GNRS.
GVS, CDS...what's the difference?
The GVS was previously developed under the name CVS (Centroid Detection Service) as an application for the detection of political division centroids. It has been renamed to reflect the wider range of features currently available.
Ubuntu 16.04 or higher
PostgreSQL/psql 10 or higher (PostGIS extension installed by this script)
Requires access to the GNRS (https://github.com/ojalaquellueva/gnrs.git
) either as API or local batch service. Version in this repository uses the BIEN GNRS API (http://vegbiendev.nceas.ucsb.edu:8875/gnrs_ws.php).
This script must be run by a user with authorization to connect to postgres (as specified in pg_hba
file). The admin-level and read-only Postgres users for the gadm database (specified in params.sh
) should already exist and must be authorized to connect to postgres (as specified in pg_hba file).
# Create application base directory
mkdir -p gvs
cd gvs
# Create application code directory
mkdir src
# Install application code from repository
cd src
git clone https://github.com/ojalaquellueva/gvs
# Move data and sensitive parameters directories outside of code directory
mv data ../
mv config ../
- After moving the config directory:
- Rename all config files by removing "example" from file names
- Set all passwords, paths and other parameter values in the config files
- The temporary application data directory
/tmp/gvs
is now installed on the fly by the application. You no longer need to install it manually.
Raw data for the CSV is one or more pairs of coordinates in decimal format, separated by a single comma, with latitude first. E.g.,
latitude,longitude
36.580435,-96.53331
39.8081822436996,-91.6228915663878
46.0,25.0
52.92755,4.7864
-23.62,-65.43
-29.178651024973867,149.269218
-29.231478025060987,152.13519
51.81171,-3.8879
Data are submitted to the GVS via the shell command line as a CSV (comma delimitted) text file, formatted as above in Input. Data submitted via the API or GVS R package must be converted to JSON and attached to the body of a POST request (see API documentation in this repository, and the separate RCDS (=GVS) repository https://github.com/EnquistLab/RCDS).
Field name | Meaning | Data type | Constrained values | Can be NULL? | Notes |
---|---|---|---|---|---|
id | Unique identifier | Integer | No | Assigned by GVS | |
latlong_verbatim | Coordinates submitted | Text | Coordinate pair exactly as submitted | ||
latitude_verbatim | Latitude submitted | Text | Latitude portion only | ||
longitude_verbatim | Longitude submitted | Text | Latitude portion only | ||
latitude | Latitude extracted from input | Decimal | Decimal latitude to original number of decimal places | ||
longitude | Longitude extracted from input | Decimal | Decimal longitude to original number of decimal places | ||
country | Country in which point located | Text | |||
state | State/province in which point located | Text | |||
county | County/parish in which point located | Text | |||
gid_0 | GADM identier of country | Text | |||
gid_1 | GADM identier of state/province | Text | |||
gid_2 | GADM identier of county/parish | Text | |||
country_cent_dist | Distance in km to country centroid | Decimal | |||
country_cent_dist_relative | Relative distance to country centroid | Decimal | country_cent_dist / country_cent_dist_max | ||
country_cent_type | Type of centroid | Text | bb, bb_main, pos, pos_main, std, std_main* | Yes | |
country_cent_dist_max | Maximum distance to centroid within country | Decimal | Distance from centroid to farthest point along political division perimeter. | ||
is_country_centroid | Is point likely a country centroid? | Integer | 1 | Yes | Equals 1 (yes) if country_cent_dist_relative < max_dist_rel |
state_cent_dist | Distance in km to state centroid | Decimal | |||
state_cent_dist_relative | Relative distance to state centroid | Decimal | state_cent_dist / state_cent_dist_max | ||
state_cent_type | Type of centroid | Text | bb, bb_main, pos, pos_main, std, std_main* | Yes | |
state_cent_dist_max | Maximum possible distance to centroid within state | Decimal | Distance from centroid to farthest point along political division perimeter. | ||
is_state_centroid | Is point likely a state centroid? | Integer | 1 | Yes | Equals 1 (yes) if state_cent_dist_relative < max_dist_rel |
county_cent_dist | Distance in km to county centroid | Decimal | |||
county_cent_dist_relative | Relative distance to county centroid | Decimal | county_cent_dist / county_cent_dist_max | ||
county_cent_type | Type of centroid | Text | bb, bb_main, pos, pos_main, std, std_main* | Yes | |
county_cent_dist_max | Maximum possible distance to centroid within county | Decimal | Distance from centroid to farthest point along political division perimeter. | ||
is_county_centroid | Is point likely a county centroid? | Integer | 1 | Yes | Equals 1 (yes) if county_cent_dist_relative < max_dist_rel |
subpoly_cent_dist | Distance in km to country subpolygon centroid | Decimal | Smallest distance to a spatially separate country subpolygon, such as an offshore island. | ||
subpoly_cent_dist_relative | Relative distance to country subpolygon centroid | Decimal | subpoly_cent_dist / subpoly_cent_dist_max | ||
subpoly_cent_type | Type of centroid | Text | bb, pos, std* | Yes | |
subpoly_cent_dist_max | Maximum distance to centroid within country subpolygon | Decimal | Distance from centroid to farthest point along subpolygon perimeter. | ||
is_subpoly_centroid | Is point likely a country subpolygon centroid? | Integer | 1 | Yes | Equals 1 (yes) if subpoly_cent_dist_relative < max_dist_rel |
centroid_dist_km | Distance in km to consensus centroid, if any | Decimal | This and the other "centroid_" fields only populated if country, state, county or subpolygon is flagged as a likely centroid. | ||
centroid_dist_relative | Relative distance to consensus centroid | Decimal | |||
centroid_type | Type of centroid | Text | bb, bb_main, pos, pos_main, std, std_main* | Yes | |
centroid_dist_max_km | Maximum distance to centroid within political division of consensus centroid | Decimal | |||
centroid_poldiv | Most likely (consensus) centroid, if any | Text | country, county, state, other | Yes | other=separate subpolygon other than a political division (e.g., island) |
max_dist_rel | Maximum relative distance threshold | Decimal | No | Parameter, can be set by user. Default=0.002 | |
latlong_err | Points out of range or in ocean | Text | "Coordinates non-numeric", "Coordinate values out of bounds", "One or more missing coordinates", "In ocean" | Yes | NULL if no errors detected |
coordinate_decimal_places | Smallest number of decimal places detected in verbatim latitude and longitude | Integer | Up to 14 decimal places detected | ||
coordinate_inherent_uncertainty_m | Inherent uncertainty in km due to decimal places used | Decimal | Difference in radius between the smallest and largest circles centered on point and consistent with decimal places used. | ||
user_id | Optional user-supplied identifier | Text | May be any value or none. Not used by GVS. |
* See Constrained Values
Category | Value | Meaning | Notes |
---|---|---|---|
Centroid type | bb | Bounding box | |
Centroid type | bb_main | Bounding box, largest subpolygon | Same as bb if only one polygon |
Centroid type | pos | Point on surface | Centroid guaranteed inside perimeter for irregularly-shaped polygons |
Centroid type | pos_main | Point on surface, largest subpolygon | Same as std if only one polygon |
Centroid type | std | Standard centroid-of-mass | May fall outside perimeter of irregularly-shaped polygons |
Centroid type | std_main | Standard centroid-of-mass, largest subpolygon | Same as std if only one polygon |
latlong_error | Coordinates non-numeric | Latitude or longitude not a decimal number | |
latlong_error | Coordinate values out of bounds | Latitude out of range [-90:90] or longitude out of range [-180:180] | |
latlong_error | One or more missing coordinates | Latitude or longitude or both are missing | |
latlong_error | In ocean | Point in ocean |
See README in gvs_db/
.
- Processes file of geocoordinates in single batch
./gvs.sh -f <input_filename_and_path> [other options]
Option | Meaning | Required? | Default value |
---|---|---|---|
-f | Input file and path | Yes | |
-o | Output file and path | No | [input_file_name]_gvs_results.csv |
-s | Silent mode | No | Verbose/interactive mode by default |
-m | Send notification message at start and completion, or on fail | No (must be followed by valid email if included) |
./gvs.sh -f myfile.csv -m [email protected]
- Processes file of geocoordinates in parallel mode (multiple batches)
- If you get a permission error, try running as sudo
./gvspar.pl -in <input_filename_and_path> -out <output_filename_and_path> -nbatch <batches> -opt <makeflow_options>
Option | Meaning | Required? | Default value |
---|---|---|---|
-in | Input file and path | Yes | |
-out | Output file and path | No | [input_file_name]_gvs_results.csv |
-nbatch | Number of batches | Yes | |
-opt | Makeflow options | No |
- On some operating system configurations you may need to run using sudo to enable access to temp folder
/tmp/gvs
, especially if this directory doesn't exist (in which case, the application will attempt to create it). Test first without sudo.
./gvspar.pl -in "data/gvs_testfile.csv" -nbatch 3
See https://github.com/ojalaquellueva/gvs/tree/master/api#readme
.
https://github.com/ojalaquellueva/gvs/blob/master/api/example_scripts/gvs_api_example.R`
https://github.com/EnquistLab/RCDS
- Note: The GVS R package is currently called "RCDS"
Web: https://gvs.biendata.org`
Repository: https://github.com/EnquistLab/GVSweb