output |
---|
github_document |
library(envReport)
library(envImport)
library(magrittr)
The goal of envImport is to obtain, and assemble, environmental data from disparate data sources, for a geographic area of interest. As little filtering as possible occurs when obtaining data and envImport does not aim to clean, filter or tidy the data (see envClean for help there). Usually the end result of an envImport workflow is a single object that contains all records from all the data sources. Sourcing and assembling environmental rasters for the area of interest is also in scope, but poorly implemented / documented currently.
You can install the development version of envImport from GitHub with:
# install.packages("devtools")
devtools::install_github("Acanthiza/envImport")
data_name
= 'data source'. Data sources are (usually) obvious sources of data. Examples are the Global Biodiversity Infrastructure Facility (GBIF) or Terrestrial Ecosystems Network (TERN). The 10 data sources currently supported or with plans for development are (also see envImport::data_map
):
- bdbsa: Biological databases of South Australia
- egis: Occurrence datasets from the environmental databases of South Australia (e.g. supertables)
- havplot: Harmonised Australian Vegetation Plot dataset (HAVPlot)
- tern: Terrestrial ecosystem network
- alis: Arid lands information systems
- nvb: DEW Native Vegetation Branch
- bcm: Bushland condition monitoring
- other: Other private datasets: SA Bird Atlas (UOA/Birds SA), Birdlife Australia Birdata portal, MLR Extra Bandicoot data, KI Post Fire Bird Monitoring, SA Seed Conservation Centre
- ptp: Paddock tree project
- gbif: Global biodiversity information facility
The data_map (see table @ref(tab:dataMap)) provides a mapping from original data sources to the desired columns in the assembled data set.
knitr::kable(data_map
, caption = "Data map of desired columns in the assembled data (columns) and names of columns in the original data (rows)"
)
Table: Data map of desired columns in the assembled data (columns) and names of columns in the original data (rows)
data_name | order | epsg | site | date | lat | long | original_name | common | nsx | occ_derivation | quantity | survey_nr | survey | ind | rel_metres | sens | lifeform | lifespan | cover | cover_code | height | quad_x | quad_y | epbc_status | npw_status | method | obs | denatured | desc | kingdom | data_name_use |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
bdbsa | 1 | 7844 | PATCHID | OBSDATE | LATITUDE | LONGITUDE | CONCATNAMAUTH | COMNAME1 | NSXCODE | NUMOBSERVED | NUMOBSERVED | SURVEYNR | SURVEYNAME | ISINDIGENOUS | rel_metres | NA | MUIRCODE | LIFESPAN | COVER | COVCODE | NA | VEGQUADSIZE1 | VEGQUADSIZE2 | ESACTSTATUSCODE | NPWACTSTATUSCODE | METHODDESC | observer | NA | Biological databases of South Australia | kingdom | BDBSA |
egis | 2 | 7844 | EGISCODE | SIGHTINGDATE | LATITUDE | LONGITUDE | SPECIES | COMNAME | NSXCODE | NUMOBSERVED | NUMOBSERVED | SURVEYNR | SURVEYNAME | ISINDIGENOUSFLAG | rel_metres | DISTRIBNDESC | NA | NA | NA | NA | NA | NA | NA | ESACTSTATUSCODE | NPWACTSTATUSCODE | METHODDESC | OBSERVER | NA | Occurrence datasets from the environmental databases of South Australia (e.g. supertables) | kingdom | EGIS |
havplot | 3 | 4326 | plotName | obsStartDate | decimalLatitude | decimalLongitude | scientificName | NA | NA | abundanceValue | abundanceValue | NA | projectID | NA | coordinateUncertaintyInMetres | NA | NA | NA | cover | NA | NA | length | width | NA | NA | abundanceMethod | individualName | NA | Harmonised Australian Vegetation Plot dataset (HAVPlot) | kingdom | HAVPlot |
tern | 4 | 4326 | site_unique | visit_start_date | latitude | longitude | species | NA | NA | NA | NA | NA | NA | NA | NA | NA | lifeform | NA | cover | NA | height | quadX | quadY | NA | NA | NA | observer_veg | NA | Terrestrial ecosystem network | kingdom | TERN |
alis | 5 | 4326 | SITENUMBER | SurveyDate | LATITUDE | LONGITUDE | CONCATNAMAUTH | COMNAME1 | NSXCode | NA | NA | NA | LandSystem | ISINDIGENOUS | NA | NA | Lifeform | LIFESPAN | Cover | NA | NA | NA | NA | ESACTSTATUSCODE | NPWACTSTATUSCODE | NA | observer | NA | Arid lands information systems | kingdom | ALIS |
nvb | 5 | 4326 | path | date | lat | lon | Spp | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | assessor | NA | DEW Native Vegetation Branch | kingdom | NVB |
bcm | 7 | 4326 | SITE_ID | ASSESSMENT_DATE | LATITUDE | LONGITUDE | CONCATNAMAUTH | COMNAME1 | species | NA | NA | NA | NA | ISINDIGENOUS | NA | NA | NA | LIFESPAN | NA | NA | NA | X_DIM | Y_DIM | ESACTSTATUSCODE | NPWACTSTATUSCODE | NA | assessor | NA | Bushland condition monitoring | kingdom | BCM |
other | 8 | 4326 | Site | SIGHTINGDATE | LATITUDE | LONGITUDE | SPECIES | NA | NA | NUMOBSERVED | NUMOBSERVED | SURVEYNR | SURVEYNAME | NA | maxDist | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | METHODDESC | observer | NA | Other private datasets: SA Bird Atlas (UOA/Birds SA), Birdlife Australia Birdata portal, MLR Extra Bandicoot data, KI Post Fire Bird Monitoring, SA Seed Conservation Centre | kingdom | Other |
ptp | 9 | 4326 | PlantDataID | Obs_Date | LATITUDE | LONGITUDE | CONCATNAMAUTH | COMNAME1 | NSXCODE | NA | NA | NA | NA | ISINDIGENOUS | NA | NA | Life_form | LIFESPAN | NA | Cover_abundance | NA | NA | NA | NA | NA | NA | Observers | NA | Paddock tree project | kingdom | PTP |
gbif | 10 | 4326 | gbifID | eventDate | decimalLatitude | decimalLongitude | scientificName | NA | organismID | occurrenceStatus | organismQuantity | NA | NA | NA | coordinateUncertaintyInMeters | NA | NA | NA | organismQuantity | NA | NA | NA | NA | NA | NA | samplingProtocol | recordedBy | informationWithheld | Global biodiversity information facility | kingdom | GBIF |
get_x
functions get data from the data source x
. Results are always saved to disk (as getting data can be slow). When run again, they load from the saved file by default. If available, get_x
functions use any R packages and functions provided by the data source (e.g. GBIF provides rgbif
[@R-rgbif;@rgbif2017] and TERN provides ausplotsR
[@R-ausplotsR]). The first arguments to get_x
functions are always:
aoi
: an area of interest, provided as simple feature.get_x
will turn anyaoi
into a bounding box and convert to coordinates appropriate for data sourcex
. [Ed: isaoi
always required?]save_dir
: a directory to save the results to. The default (NULL
) leads to the filehere::here("out", "ds", "x.rds")
being created and used assave_file
.ds
is for 'data source'. While the saved file is usuallyx.rds
, in some instances it follows the format and naming of the download fromx
(e.g. GBIF data comes in a.zip
file named by the corresponding download key).get_new
: an override to forceget_x
to requery the data source, even if save_file already exists...
: the dots are passed to any underlying 'native' function, such asrgbif::occ_download()
orausplotsR::get_ausplots()
Many of the get_x
functions will only work within DEW
.
As of June 2024, get_x
functions can be run from get_data
.
unite_data
writes bio_all
.