Skip to content

Commit

Permalink
Merge pull request #115 from jdpye/gh-pages
Browse files Browse the repository at this point in the history
WIP: correct Data Cleaning code
  • Loading branch information
MathewBiddle authored Mar 15, 2024
2 parents c7fd193 + d73e437 commit abf1458
Show file tree
Hide file tree
Showing 4 changed files with 65 additions and 14 deletions.
3 changes: 2 additions & 1 deletion _config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,7 @@ extras_order:
- helpful-tutorials
- figures
- edna-extension
- special-data-type-acoustic-telemetry

# Specify that things in the episodes collection should be output.
collections:
Expand All @@ -147,7 +148,7 @@ defaults:
type: extras
values:
root: ..
layout: page
layout: episode

# Files and directories that are not to be copied.
exclude:
Expand Down
25 changes: 12 additions & 13 deletions _episodes/03-data-cleaning.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,7 @@ ISO 8601 dates can represent moments in time at different resolutions, as well a
> import pandas as pd
> df = pd.DataFrame({'start_date':['2021-01-30'],
> 'end_date':['2021-01-31']})
> df['eventDate'] = df['start_time']+'/'+df['end_time']
> df['eventDate'] = df['start_date']+'/'+df['end_date']
> df
> ```
> ```output
Expand All @@ -165,9 +165,10 @@ ISO 8601 dates can represent moments in time at different resolutions, as well a
> ```r
> library(lubridate)
> date_str <- '01/31/2021 17:00 GMT'
> lubridate::mdy_hm(date_str,tz="UTC")
> date <- lubridate::mdy_hm(date_str,tz="UTC")
> date <- lubridate::format_ISO8601(date) # Separates date and time with a T.
> date <- paste0(date, "Z") # Add a Z because time is in UTC.
> date
> ```
> ```output
> [1] "2021-01-31T17:00:00Z"
Expand All @@ -178,9 +179,10 @@ ISO 8601 dates can represent moments in time at different resolutions, as well a
> library(lubridate)
> date_str <- '31/01/2021 12:00 EST'
> date <- lubridate::dmy_hm(date_str,tz="EST")
> lubridate::with_tz(date,tz="UTC")
> date <- lubridate::with_tz(date,tz="UTC")
> date <- lubridate::format_ISO8601(date)
> date <- paste0(date, "Z")
> date
> ```
> ```output
> [1] "2021-01-31T17:00:00Z"
Expand All @@ -191,10 +193,11 @@ ISO 8601 dates can represent moments in time at different resolutions, as well a
> ```r
> library(lubridate)
> date_str <- 'January, 01 2021 5:00 PM GMT'
> date <- lubridate::mdy_hm(date_str, format = '%B, %d %Y %H:%M', tz="GMT")
> date <- lubridate::mdy_hm(date_str, tz="GMT")
> lubridate::with_tz(date,tz="UTC")
> lubridate::format_ISO8601(date)
> date <- paste0(date, "Z")
> date
> ```
> ```output
> [1] "2021-01-01T17:00:00Z"
Expand All @@ -211,7 +214,7 @@ ISO 8601 dates can represent moments in time at different resolutions, as well a
> date <- lubridate::as_datetime(date_str, origin = lubridate::origin, tz = "UTC")
> date <- lubridate::format_ISO8601(date)
> date <- paste0(date, "Z")
> print(date)
> date
> ```
> ```output
> [1] "2021-01-31T17:00:00Z"
Expand Down Expand Up @@ -247,24 +250,19 @@ ISO 8601 dates can represent moments in time at different resolutions, as well a
> library(lubridate)
> event_start <- '2021-01-30'
> event_finish <- '2021-01-31'
>
> deployment_time <- 1002
> retrieval_time <- 1102
>
> Time is recorded numerically (1037 instead of 10:37), so need to change these columns:
> # Time is recorded numerically (1037 instead of 10:37), so need to change these columns:
> deployment_time <- substr(as.POSIXct(sprintf("%04.0f", deployment_time), format = "%H%M"), 12, 16)
> retrieval_time <- substr(as.POSIXct(sprintf("%04.0f", retrieval_time, format = "%H%M"), 12, 16)
>
> # If you're interested in just pasting the event dates together:
> eventDate <- paste(event_start, event_finish, sep = "/")
>
> # If you're interested in including the deployment and retrieval times in the eventDate:
> eventDateTime_start <- lubridate::format_ISO8601(as.POSIXct(paste(event_start, deployment_time), tz = "UTC"))
> eventDateTime_start <- paste0(eventDateTime_start, "Z")
> eventDateTime_finish <- lubridate::format_ISO8601(as.POSIXct(paste(event_finish, retrieval_time), tz = "UTC"))
> eventDateTime_finish <- paste0(eventdateTime_finish, "Z")
> eventDateTime_finish <- paste0(eventDateTime_finish, "Z")
> eventDateTime <- paste(eventDateTime_start, eventDateTime_finish, sep = "/")
>
> print(eventDate)
> print(eventDateTime)
> ```
Expand Down Expand Up @@ -310,7 +308,8 @@ The other way to get the taxonomic information you need is to use [worrms](https
> ![screenshot]({{ page.root }}/fig/species_file_screenshot.png){: .image-with-shadow }
>
> 2. Upload that file to the [WoRMS Taxon match service](https://www.marinespecies.org/aphia.php?p=match)
> * **make sure the option LSID is checked**
> * **make sure the option LSID is checked**
> * **for the example file, make sure you select LineFeed as the row delimiter and Tab as the column delimiter**
> ![screenshot]({{ page.root }}/fig/WoRMS_upload.png){: .image-with-shadow }
>
> 3. Identify which columns to match to which WoRMS term.
Expand Down
51 changes: 51 additions & 0 deletions _extras/special-data-type-acoustic-telemetry.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
---
title: "Specific Data Examples - Acoustic Telemetry"
teaching: 15
exercises: 0
questions:
- "How does acoustic telemetry data convert to Darwin Core?"
- "How do I contribute my acoustic telemetry data to OBIS?"
objectives:
- "Collaborating with a network"
- "Understanding the mappings and what metadata and data are important to converting acoustic telemetry to Darwin Core"
- "Contributing tracking data for animals that are monitored with multiple methods of electronic or mark-recapture tracking regimes. (Acoustic, Satellite, RFID, coded-wire)"
keypoints:
- "The Ocean Tracking Network is the thematic OBIS node for animal telemetry data."
- "Mature pipelines exist to take acoustic telemetry data from projects contributing to OTN or to its regional nodes and publish standard, summarized datasets to OBIS."
---

# Acoustic Telemetry Data
### How It Works

Acoustic telemetry is a general term that describes the practice of implanting marine animals with electronic tags that emit an identifiable coded set of 'pings' to transmit a unique identifier to any listening station that is near enough to record them. Many companies sell systems of coded transmitters (tags) and listening devices (receivers) to allow researchers to track the movements of underwater animals at sub-1km scales. Recent developments allow for many listening stations to synchronize their detection of a 'ping' to allow for fine-scale triangulation of animal position.



### Networks - getting more data for your tracked animals

Acoustic telemetry equipment is often intercompatible, allowing the listening stations of one project to detect the tagged individuals from any number of other projects. To maximize the utility of their tag detection data and the detectability of their tagged animals, many researchers engaged in acoustic telemetry research contribute their data to a data aggregating network. The global-scale aggregation network for handling and cross-referencingthis data is the Ocean Tracking Network, who coordinate many regional networks towards intercompatibility and harmonize the quality control and data pipelines to ensure all networks are able to contribute to a global observation network. Today that network-of-networks has many thousands of listening stations and tracks tens of thousands of active acoustic tags globally. Details on what the Ocean Tracking Network and its partners are currently observing are available at https://members.oceantrack.org/statistics/

### Preparing to publish to OBIS - how we solve the many-fish vs. one-fish-many-times problem

Each of the telemetry networks produce for their investigators a data report that takes into account every compatible listening device across all of the networks that has heard their tagged animals. These reports can run into the millions of detections for anadromous fish, closed system monitoring, or where an animal may be resident near a listening device (or deceased near one). The Ocean Tracking Network has created a data pipeline for projects that combine all the observations for a certain project into a DarwinCore Event Core archive.

When preparing to submit this data to the Ocean Biodiversity Information System, we as data managers must consider how to put these tens or hundreds of thousands of animal locations into the context of the broader database. By making summaries of the animal position data, and by using the organismID field to denote multiple positions for a single known individual organism, the OTN publication process allows telemetry data to contribute fully to the OBIS database without creating too much data density or losing the details of each individual animal's movement history.


### Mapping example

OTN maintains this wiki entry example of how to populate each of the Darwin Core fields for an acoustic telemetry-sourced data Darwin Core archive.

https://github.com/tdwg/dwc-for-biologging/wiki/Acoustic-sensor-enabled-tracking-of-blue-sharks

### Satellite telemetry, light-level geolocation, etc.

Satellite telemetry presumes surfacing events to obtain a GPS or Argos network location for a tagged individual. These


### Multiple-method tagging of animals

Often, animals are tagged with multiple technologies to better capture their movement over smaller and larger distances.


{% include links.md %}
Binary file modified fig/WoRMS_upload.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit abf1458

Please sign in to comment.