Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LA-pipelines: expert distribution outlier detection #622

Open
qifeng-bai opened this issue Nov 11, 2021 · 2 comments
Open

LA-pipelines: expert distribution outlier detection #622

qifeng-bai opened this issue Nov 11, 2021 · 2 comments
Assignees

Comments

@qifeng-bai
Copy link
Collaborator

qifeng-bai commented Nov 11, 2021

Check if a species occurrence record point is in/out of the expert distribution layers

Questions:

  1. Efficiency issue: Using polygons may slow down the process? Using Grids? how to determine the size of grids

Using LayerStore / Spatial Service, or build our local instance or Postgres

  1. Assertions we should use
  2. Possible to Calculate
  • a distance of the point inside/outside expected distribution field to the record
  • point outside the range and uncertainty overlaps the range
  • point outside the range and uncertainty outside the range

Two scenarios:
1, Calculate all exisiting occurrences with existing expert distribution layers - one-time run
2, Re-calculate the related species when a new export distribution layer is added.

Link to Data Quality (DQ) project : AtlasOfLivingAustralia/DataQuality#255
Require to Spatial: AtlasOfLivingAustralia/spatial-service#186

@qifeng-bai
Copy link
Collaborator Author

qifeng-bai commented Nov 26, 2021

Modification to la-pipeline.yaml is needed
Add 'outlier' section :
outlier:
appName: Expert distribution outliers for {datasetId}
baseUrl: https://spatial.ala.org.au/ws/
inputPath: '{fsPath}/pipelines-outlier'
targetPath: &outlierTargetPath '{fsPath}/pipelines-outlier'
allDatasetsInputPath: '{fsPath}/pipelines-all-datasets'
runner: DirectRunner

and insert the fellowing to 'solr' section

outlierPath: *outlierTargetPath
includeOurtlier: false

qifeng-bai added a commit that referenced this issue Nov 28, 2021
qifeng-bai added a commit that referenced this issue Nov 28, 2021
qifeng-bai added a commit that referenced this issue Nov 28, 2021
qifeng-bai added a commit that referenced this issue Nov 29, 2021
qifeng-bai added a commit that referenced this issue Dec 2, 2021
qifeng-bai added a commit that referenced this issue Dec 16, 2021
@qifeng-bai
Copy link
Collaborator Author

SLF4J issues:
1, found the pipelines uses logback as the default logger, la-pipelines script currently is using log4j.properties , logging is not working.

2, Added logback.xml into ./resource folder. And import an external config via -Dlogback.configurationFile

The example of external config looks like:

<included>
<logger name="org.gbif" level="info" additivity="false">
<appender-ref ref="CONSOLE"/>
</logger>
<logger name="au.org.ala.pipelines" level="debug" additivity="false">
<appender-ref ref="CONSOLE"/>
</logger>
</included>

We can use it to update the log level. It works on my local dev environment.

However, we run it on NCI3-spark servers, the logger does not work. It does not accept : "included" element.

For making it work, we have to duplicate the content in ./resource/logback.xml

qifeng-bai added a commit that referenced this issue Jan 9, 2022
qifeng-bai added a commit that referenced this issue Feb 2, 2022
qifeng-bai added a commit that referenced this issue Feb 2, 2022
djtfmartin pushed a commit that referenced this issue Feb 3, 2022
djtfmartin pushed a commit that referenced this issue Feb 3, 2022
djtfmartin pushed a commit that referenced this issue Feb 3, 2022
djtfmartin pushed a commit that referenced this issue Feb 3, 2022
djtfmartin pushed a commit that referenced this issue Feb 3, 2022
djtfmartin pushed a commit that referenced this issue Feb 3, 2022
djtfmartin pushed a commit that referenced this issue Feb 3, 2022
djtfmartin pushed a commit that referenced this issue Feb 3, 2022
djtfmartin added a commit that referenced this issue Mar 3, 2022
* #622 Calculate distance to expert distribution layers (#631)
* fix for #680 and a fix for missing properties when indexed generate with expert distribution
using deltas with timestamps

Co-authored-by: Dave Martin <[email protected]>
Co-authored-by: Dave Martin <[email protected]>
djtfmartin pushed a commit that referenced this issue Mar 3, 2022
djtfmartin pushed a commit that referenced this issue Mar 3, 2022
djtfmartin pushed a commit that referenced this issue Mar 3, 2022
djtfmartin pushed a commit that referenced this issue Mar 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant