diff --git a/docs/Project.toml b/docs/Project.toml index 13e76347b..95ad821d4 100644 --- a/docs/Project.toml +++ b/docs/Project.toml @@ -12,9 +12,16 @@ GBIF2 = "dedd4f52-e074-43bf-924d-d6bce14ad628" GeoInterface = "cf35fbd7-0cd7-5166-be24-54bfbe79505f" HDF5 = "f67ccb44-e63f-5c2f-98bd-6dc0ccc4ba2f" Literate = "98b081ad-f1c9-55d3-8b20-4c87d4299306" +MLJBase = "a7f614a8-145f-11e9-1d2a-a57a1082229d" +MLJGLMInterface = "caf8df21-4939-456d-ac9c-5fefbfb04c0c" Makie = "ee78f7c6-11fb-53f2-987a-cfe4a2b5a57a" +Maxnet = "81f79f80-22f2-4e41-ab86-00c11cf0f26f" NCDatasets = "85f8d34a-cbdd-5861-8df4-14fed0d494ab" Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80" RasterDataSources = "3cb90ccd-e1b6-4867-9617-4276c8b2ca36" Rasters = "a3a2b9e3-a471-40c9-b274-f788e487c689" Shapefile = "8e980c4a-a4fe-5da2-b3a7-4b4b0353a2f4" +SpeciesDistributionModels = "3ef73bbf-0321-4d3b-9a2e-5fbebc8e35da" + +[sources] +SpeciesDistributionModels = {url = "https://github.com/tiemvanderdeure/SpeciesDistributionModels.jl/"} \ No newline at end of file diff --git a/docs/src/gbif_wflow.md b/docs/src/gbif_wflow.md index 206a919ac..5bd6fcedc 100644 --- a/docs/src/gbif_wflow.md +++ b/docs/src/gbif_wflow.md @@ -1,14 +1,15 @@ # Species distribution modelling workflow -This example show how to load Bioclim data as a RasterStack -and extract species occurrences points from it, a common use-case in ecology. +This example shows a full Species distribution modelling workflow, from loading data, to cleaning it, to fitting an ensemble and generating predictions. -## Load Rasters, RasterDataSources and GBIF +It uses GBIF and WorldClim data, which are common datasets in ecology. + +## Load Rasters, ArchGDAL, RasterDataSources and GBIF +The GBIF2 library is used to download occurrence data, RasterDataSources to conveniently access Bioclim data. ArchGDAL is necessary to load in the Bioclim data. ````@example gbif using Rasters, GBIF2 -using RasterDataSources -const RS = Rasters +using RasterDataSources, ArchGDAL ```` Load occurrences for the Mountain Pygmy Possum using GBIF.jl @@ -17,45 +18,102 @@ Load occurrences for the Mountain Pygmy Possum using GBIF.jl records = GBIF2.occurrence_search("Burramys parvus"; limit=300) ```` -## Extract coordinates - -Extract the longitude/latitude value to a `Vector` of points -(a `Tuple` counts as a `(x, y)` point in GeoInterface.jl): - -````@example gbif -coords = [(r.decimalLongitude, r.decimalLatitude) for r in records if !ismissing(r.decimalLatitude)] -```` - -## Get layer / Band -Get BioClim layers and subset to south-east Australia +## Get Bioclimatic variables +Get BioClim layers and subset to south-east Australia. +The first time this is run, this will automatically download and save the files. ````@example gbif A = RasterStack(WorldClim{BioClim}, (1, 3, 7, 12)) -se_aus = A[X(138 .. 155), Y(-40 .. -25), RS.Band(1)] +se_aus = A[X(138 .. 155), Y(-40 .. -25), Band(1)] ```` Plot BioClim predictors and scatter occurrence points on all subplots ````@example gbif -using Plots -p = plot(se_aus); -kw = (legend=:none, opacity=0.5, markershape=:cross, markercolor=:black) -foreach(i -> scatter!(p, coords; subplot=i, kw...), 1:4) +# The coordinates from the gbif table +coords = collect(skipmissing(records.geometry)) + +using CairoMakie +p = Rasters.rplot(se_aus); +for ax in p.content + if ax isa Axis + scatter!(ax, coords; alpha=0.5, marker='+', color=:black, markersize = 20) + end +end p ```` -Then extract predictor variables and write to CSV. +## Extract bioclim variables at occurrence points +Then extract predictor variables and write to CSV. Use the skipmissing keyword to exclude both missing coordinates and coordinates with missing values in the RasterStack. + ````@example gbif using CSV -predictors = collect(extract(se_aus, coords)) -CSV.write("burramys_parvus_predictors.csv", predictors) +presences = extract(se_aus, coords, skipmissing = true) +CSV.write("burramys_parvus_predictors.csv", presences) ```` Or convert them to a `DataFrame`: ````@example gbif using DataFrames -df = DataFrame(predictors) +df = DataFrame(presences) df[1:5,:] ```` -From here you can use [SpeciesDistributionModels.jl](https://github.com/tiemvanderdeure/SpeciesDistributionModels.jl) for analysis. +## Sample background points +Next, sample random background points in the Raster. Rasters has a StatsBase extension to make this very straightforward. The syntax and output of `Rasters.sample` is very similar to that of `extract`. + +````@example gbif +using StatsBase +background = Rasters.sample(se_aus, 500, skipmissing = true) +```` + +## Fit a statistical ensemble +In this example, we will [SpeciesDistributionModels.jl](https://github.com/tiemvanderdeure/SpeciesDistributionModels.jl) to fit a statistical ensemble to the occurrence and background data. + +First we need to load the models. SDM.jl integrates with MLJ - see the [model browser](https://juliaai.github.io/MLJ.jl/dev/model_browser/#Classification) for what models are available. + +````@example gbif +import Maxnet: MaxnetBinaryClassifier +import MLJGLMInterface: LinearBinaryClassifier +# define the models in the ensemble +models = ( + maxnet = MaxnetBinaryClassifier(), + maxnet2 = MaxnetBinaryClassifier(features = "lq"), + glm = LinearBinaryClassifier() +) +```` + +Next, format the data using `sdmdata`. To test how rigurous our models are, we will use 3-fold cross-validation. + +````@example gbif +using SpeciesDistributionModels +const SDM = SpeciesDistributionModels +data = sdmdata(presences, background; resampler = CV(; nfolds = 3)) +```` + +Now, fit the ensemble, passing the data object and the `NamedTuple` of models! + +````@example gbif +ensemble = sdm(data, models) +```` + +Use SDM.jl's evaluate function to see how this ensemble performs. + +````@example gbif +SDM.evaluate(ensemble) +```` + +Not too bad! + +## Make predictions of climatic suitability +Use the ensemble to + +````@example gbif +suitability = SDM.predict(ensemble, se_aus, reducer = mean) +```` + +And let's see what that looks like + +````@example gbif +plot(suitability, colorrange = (0,1)) +````