Detect if a species occurrence record is within its expected spatial distribution #255

M-Nicholls · 2021-10-20T21:17:56Z

Where should this occur - part of the pipelines or a separate process?

check layers are available outlier detection

run expert distribution outlier detection - is there an expert distribution for the species, if so detect if a species occurrence record point is in/out of the expert distribution

add a distance of the point inside/outside expected distribution field to the record

add expert distribution outlier category
(compare the distance inside/outside the distribution boundary to the uncertainty)

within expected distribution - point and full uncertainty are within the range
likely within expected distribution - point within the range uncertainty is out
may be within expected distribution - point outside the range and uncertainty overlaps the range
outside expected distribution - point outside the range and uncertainty outside the range

Two scenarios:

1, Calculate all exisiting occurrences with existing expert distribution layers - one-time run
2, Re-calculate the related species when a new export distribution layer is added.

Link to pipeline issue: gbif/pipelines#622
Link to Spatial issue: AtlasOfLivingAustralia/spatial-service#186

M-Nicholls · 2021-11-02T22:14:10Z

what to do with generalised records
how to take record uncertainty into account

use the size of the distribution to determine how much the uncertainty or generalisation matters?
i.e. for a very small distribution uncertainty and generalisation will make a big difference as to whether the point is in or out
should records be considered in or out if it's uncertainty puts it in the range but the point is outside the range?

indicate the point is in/out but based on the uncertainty the record may be out/in

categories -
within expected distribution - point and full uncertainty are within the range
likely within expected distribution - point within the range uncertainty is out
may be within expected distribution - point outside the range and uncertainty overlaps the range
outside expected distribution - point outside the range and uncertainty outside the range

use of categories and distance outside distribution provides a through combination of metrics

M-Nicholls · 2021-11-03T01:52:56Z

Add to data pre-filters
update assertion metadata
update support material

M-Nicholls · 2021-11-10T00:06:49Z

what to do if there are multiple overlapping layers - e.g. likely | maybe layers and separate east coast/west coats layers e.g. grey nurse shark

qifeng-bai · 2021-11-11T22:12:50Z

what to do if there are multiple overlapping layers - e.g. likely | maybe layers and separate east coast/west coats layers e.g. grey nurse shark

Single layer / multi layers won't affect the calculation of in/out of layers, but it brings difficulty in calculating distance

qifeng-bai · 2022-01-27T01:04:38Z

Solution:
Jenkins schedules to run the program once every day.

For every run:
Pipelines loads all indexed records
Comparing with the existing outlier records, filter the new added records
Calculate outliers of those new records ONLY.

If a new expert layer is added or updated, manually deleted exisiting outlier records, then Pipelines will recalculated all index records

qifeng-bai mentioned this issue Nov 11, 2021

LA-pipelines: expert distribution outlier detection gbif/pipelines#622

Open

qifeng-bai self-assigned this Nov 11, 2021

qifeng-bai mentioned this issue Nov 17, 2021

Add expert distribution outlier detection AtlasOfLivingAustralia/spatial-service#186

Closed

acbuyan changed the title ~~Detect if a species occurrence record is within it's expected spatial distribution~~ Detect if a species occurrence record is within its expected spatial distribution Nov 26, 2024

acbuyan added Enhancement Requests for new feature or improvements to existing features Data Quality Assertions Anything relating to data quality assertions, including distributions, pipelines and other labels Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detect if a species occurrence record is within its expected spatial distribution #255

Detect if a species occurrence record is within its expected spatial distribution #255

M-Nicholls commented Oct 20, 2021 •

edited by qifeng-bai

Loading

M-Nicholls commented Nov 2, 2021 •

edited

Loading

M-Nicholls commented Nov 3, 2021

M-Nicholls commented Nov 10, 2021

qifeng-bai commented Nov 11, 2021

qifeng-bai commented Jan 27, 2022

Detect if a species occurrence record is within its expected spatial distribution #255

Detect if a species occurrence record is within its expected spatial distribution #255

Comments

M-Nicholls commented Oct 20, 2021 • edited by qifeng-bai Loading

M-Nicholls commented Nov 2, 2021 • edited Loading

M-Nicholls commented Nov 3, 2021

M-Nicholls commented Nov 10, 2021

qifeng-bai commented Nov 11, 2021

qifeng-bai commented Jan 27, 2022

M-Nicholls commented Oct 20, 2021 •

edited by qifeng-bai

Loading

M-Nicholls commented Nov 2, 2021 •

edited

Loading