Skip to content

Commit

Permalink
more complete sdm workflow (#831)
Browse files Browse the repository at this point in the history
  • Loading branch information
tiemvanderdeure authored Dec 6, 2024
1 parent 367181d commit b9379e1
Show file tree
Hide file tree
Showing 2 changed files with 91 additions and 26 deletions.
7 changes: 7 additions & 0 deletions docs/Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,16 @@ GBIF2 = "dedd4f52-e074-43bf-924d-d6bce14ad628"
GeoInterface = "cf35fbd7-0cd7-5166-be24-54bfbe79505f"
HDF5 = "f67ccb44-e63f-5c2f-98bd-6dc0ccc4ba2f"
Literate = "98b081ad-f1c9-55d3-8b20-4c87d4299306"
MLJBase = "a7f614a8-145f-11e9-1d2a-a57a1082229d"
MLJGLMInterface = "caf8df21-4939-456d-ac9c-5fefbfb04c0c"
Makie = "ee78f7c6-11fb-53f2-987a-cfe4a2b5a57a"
Maxnet = "81f79f80-22f2-4e41-ab86-00c11cf0f26f"
NCDatasets = "85f8d34a-cbdd-5861-8df4-14fed0d494ab"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
RasterDataSources = "3cb90ccd-e1b6-4867-9617-4276c8b2ca36"
Rasters = "a3a2b9e3-a471-40c9-b274-f788e487c689"
Shapefile = "8e980c4a-a4fe-5da2-b3a7-4b4b0353a2f4"
SpeciesDistributionModels = "3ef73bbf-0321-4d3b-9a2e-5fbebc8e35da"

[sources]
SpeciesDistributionModels = {url = "https://github.com/tiemvanderdeure/SpeciesDistributionModels.jl/"}
110 changes: 84 additions & 26 deletions docs/src/gbif_wflow.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
# Species distribution modelling workflow

This example show how to load Bioclim data as a RasterStack
and extract species occurrences points from it, a common use-case in ecology.
This example shows a full Species distribution modelling workflow, from loading data, to cleaning it, to fitting an ensemble and generating predictions.

## Load Rasters, RasterDataSources and GBIF
It uses GBIF and WorldClim data, which are common datasets in ecology.

## Load Rasters, ArchGDAL, RasterDataSources and GBIF
The GBIF2 library is used to download occurrence data, RasterDataSources to conveniently access Bioclim data. ArchGDAL is necessary to load in the Bioclim data.

````@example gbif
using Rasters, GBIF2
using RasterDataSources
const RS = Rasters
using RasterDataSources, ArchGDAL
````

Load occurrences for the Mountain Pygmy Possum using GBIF.jl
Expand All @@ -17,45 +18,102 @@ Load occurrences for the Mountain Pygmy Possum using GBIF.jl
records = GBIF2.occurrence_search("Burramys parvus"; limit=300)
````

## Extract coordinates

Extract the longitude/latitude value to a `Vector` of points
(a `Tuple` counts as a `(x, y)` point in GeoInterface.jl):

````@example gbif
coords = [(r.decimalLongitude, r.decimalLatitude) for r in records if !ismissing(r.decimalLatitude)]
````

## Get layer / Band
Get BioClim layers and subset to south-east Australia
## Get Bioclimatic variables
Get BioClim layers and subset to south-east Australia.
The first time this is run, this will automatically download and save the files.

````@example gbif
A = RasterStack(WorldClim{BioClim}, (1, 3, 7, 12))
se_aus = A[X(138 .. 155), Y(-40 .. -25), RS.Band(1)]
se_aus = A[X(138 .. 155), Y(-40 .. -25), Band(1)]
````
Plot BioClim predictors and scatter occurrence points on all subplots

````@example gbif
using Plots
p = plot(se_aus);
kw = (legend=:none, opacity=0.5, markershape=:cross, markercolor=:black)
foreach(i -> scatter!(p, coords; subplot=i, kw...), 1:4)
# The coordinates from the gbif table
coords = collect(skipmissing(records.geometry))
using CairoMakie
p = Rasters.rplot(se_aus);
for ax in p.content
if ax isa Axis
scatter!(ax, coords; alpha=0.5, marker='+', color=:black, markersize = 20)
end
end
p
````

Then extract predictor variables and write to CSV.
## Extract bioclim variables at occurrence points
Then extract predictor variables and write to CSV. Use the skipmissing keyword to exclude both missing coordinates and coordinates with missing values in the RasterStack.

````@example gbif
using CSV
predictors = collect(extract(se_aus, coords))
CSV.write("burramys_parvus_predictors.csv", predictors)
presences = extract(se_aus, coords, skipmissing = true)
CSV.write("burramys_parvus_predictors.csv", presences)
````

Or convert them to a `DataFrame`:

````@example gbif
using DataFrames
df = DataFrame(predictors)
df = DataFrame(presences)
df[1:5,:]
````

From here you can use [SpeciesDistributionModels.jl](https://github.com/tiemvanderdeure/SpeciesDistributionModels.jl) for analysis.
## Sample background points
Next, sample random background points in the Raster. Rasters has a StatsBase extension to make this very straightforward. The syntax and output of `Rasters.sample` is very similar to that of `extract`.

````@example gbif
using StatsBase
background = Rasters.sample(se_aus, 500, skipmissing = true)
````

## Fit a statistical ensemble
In this example, we will [SpeciesDistributionModels.jl](https://github.com/tiemvanderdeure/SpeciesDistributionModels.jl) to fit a statistical ensemble to the occurrence and background data.

First we need to load the models. SDM.jl integrates with MLJ - see the [model browser](https://juliaai.github.io/MLJ.jl/dev/model_browser/#Classification) for what models are available.

````@example gbif
import Maxnet: MaxnetBinaryClassifier
import MLJGLMInterface: LinearBinaryClassifier
# define the models in the ensemble
models = (
maxnet = MaxnetBinaryClassifier(),
maxnet2 = MaxnetBinaryClassifier(features = "lq"),
glm = LinearBinaryClassifier()
)
````

Next, format the data using `sdmdata`. To test how rigurous our models are, we will use 3-fold cross-validation.

````@example gbif
using SpeciesDistributionModels
const SDM = SpeciesDistributionModels
data = sdmdata(presences, background; resampler = CV(; nfolds = 3))
````

Now, fit the ensemble, passing the data object and the `NamedTuple` of models!

````@example gbif
ensemble = sdm(data, models)
````

Use SDM.jl's evaluate function to see how this ensemble performs.

````@example gbif
SDM.evaluate(ensemble)
````

Not too bad!

## Make predictions of climatic suitability
Use the ensemble to

````@example gbif
suitability = SDM.predict(ensemble, se_aus, reducer = mean)
````

And let's see what that looks like

````@example gbif
plot(suitability, colorrange = (0,1))
````

0 comments on commit b9379e1

Please sign in to comment.