diff --git a/access.md b/access.md index 948d7fd..349cba2 100644 --- a/access.md +++ b/access.md @@ -21,11 +21,11 @@ We have produced a video demonstration of how to use the Mapper. The mapper allows users to visualize and inspect subsets of OBIS data. A variety of filters are available (taxonomic, geographic, time, data quality) and multiple layers can be combined in a single view. Layers can be downloaded as CSV files. -![Screenshot demonstrating where how to download a particular layer](images/mapper-DL.png){width=40%} +![*Screenshot demonstrating where how to download a particular layer*](images/mapper-DL.png){width=60%} When you download data from the mapper, you will be given the option to include eMoF and/or DNA Derived Data extensions alongside the Event and Occurrence data. You must check the boxes of extensions you want to include in your download. -![Screenshot showing the popup confirmation for which extensions you want to include in your download from the OBIS Mapper](images/mapper-extensions.png){width=40%} +![*Screenshot showing the popup confirmation for which extensions you want to include in your download from the OBIS Mapper*](images/mapper-extensions.png){width=70%} After downloading, you will notice that the Event and Occurrence data is flattened into one table, called “Occurrence.csv”. Upon inspecting this file in your viewer of choice, you will see it contains all 225 possible DwC fields, although not every field will contain data for each observation. Any extensions you checked will be downloaded as separate tables. @@ -83,7 +83,7 @@ From the OBIS homepage, you can search for data in the search bar in the middle When you search by dataset you will notice an additional option appears for [advanced search options](https://obis.org/datasets). This will allow you to identify specific datasets, and apply filters for OBIS nodes and whether datasets include extensions. -![OBIS homepage search, showing where to find the advanced search link](images/obis-homepagesearch.png){width=50%} +![*OBIS homepage search, showing where to find the advanced search link*](images/obis-homepagesearch.png){width=70%} Regardless if you found a dataset through the homepage or the advanced Dataset search, you will be able to navigate to individual dataset pages. For individual dataset pages (instead of aggregate pages for e.g., a Family) there are three buttons available: @@ -91,7 +91,7 @@ Regardless if you found a dataset through the homepage or the advanced Dataset s * Source DwC-A - download the dataset as a Darwin Core-Archive file. This will provide all data tables as separate files within a zipped folder * To mapper - this will open another browser with the data shown in the Mapper -![Dataset download](images/dataset-DL.png){width=50%} +![*Dataset download*](images/dataset-DL.png){width=70%} If you searched for aggregate datasets (e.g., all Crustacea records, all records from OBIS-Canada, etc.), the `source DwC-A` button will not be available to you. To download these data subsets, you must click `to mapper` and then [download the data from the Mapper as a CSV](#mapper). @@ -101,11 +101,11 @@ If you searched for aggregate datasets (e.g., all Crustacea records, all records To obtain a full export of OBIS data, navigate to the OBIS homepage, click on Data from the top navigation bar, then select [Data Access](https://obis.org/data/access/) from the dropdown menu. -![OBIS homepage showing where to navigate to access full database exports](images/full-export1.png){width=50%} +![*OBIS homepage showing where to navigate to access full database exports*](images/full-export1.png){width=70%} Here you will be able to download all occurrence records as a CSV or Parquet file. Note the disclaimer that such exports will not include measurement data, dropped records, or absence records. As with downloads from the Mapper, the exported file is a single Occurrence table. This table includes all provided Event and Occurrence data, as well as 68 fields added by the OBIS Quality Control Pipeline, including taxonomic information obtained from WoRMS. -![OBIS Data Access page](images/full-export2.png){width=50%} +![*OBIS Data Access page*](images/full-export2.png){width=70%} ## Finding your own data in OBIS @@ -120,7 +120,7 @@ To find your own dataset in OBIS, you can use the same tools as finding any data To contact the data provider, navigate to the page for the individual dataset in question (e.g., ). Under the “Contacts” section, there will be a list of individuals you can contact. Clicking any name will direct you to your system’s default email program. For example: -![Example of contact section on a dataset homepage access via the OBIS search](images/contact-dataprovider.png){width=40%} +![*Example of contact section on a dataset homepage access via the OBIS search*](images/contact-dataprovider.png){width=70%} If you are the node manager and need to contact the data provider about a particular dataset, contact information should be provided in the metadata and you can contact them from information provided. diff --git a/checklist.md b/checklist.md index 03721a2..c2580c7 100644 --- a/checklist.md +++ b/checklist.md @@ -4,7 +4,7 @@ There are many Darwin Core terms listed in the [TDWG quick reference guide](https://dwc.tdwg.org/terms/). However, not all these terms are necessary for publishing data to OBIS. -For your convenience, we have created a checklist of all the Darwin Core terms relevant for OBIS data providers. You can reference this list to quickly see which terms are required by OBIS, which file (Event, Occurrence, eMoF, DNA) they can be found in, and which Darwin Core class it relates to. These terms correlate with the [IPT vocabulary mapping](ipt#map-your-data-to-darwin-core.html) you will do when it comes time to publish your dataset. You may notice some terms are accepted in multiple data tables (e.g., Event and Occurrence) - this is because it depends on your dataset structure. If you have an Event Core, you will include some terms that would not be included if you had Occurrence Core. For guidance on specific class terms (e.g., location, taxonomy, etc.), see the [Darwin Core](darwin_core#darwin-core-guidelines.html) section of the manual. +For your convenience, we have created a checklist of all the Darwin Core terms relevant for OBIS data providers. You can reference this list to quickly see which terms are required by OBIS, which file (Event, Occurrence, eMoF, DNA) they can be found in, and which Darwin Core class it relates to. These terms correlate with the [IPT vocabulary mapping](ipt.html#map-your-data-to-darwin-core) you will do when it comes time to publish your dataset. You may notice some terms are accepted in multiple data tables (e.g., Event and Occurrence) - this is because it depends on your dataset structure. If you have an Event Core, you will include some terms that would not be included if you had Occurrence Core. For guidance on specific class terms (e.g., location, taxonomy, etc.), see the [Darwin Core](darwin_core.html#darwin-core-guidelines) section of the manual. Note that when you publish your dataset on the IPT, if you use a term not listed below it will be an unmapped field and will **not** be published alongside your data. You may still wish to include such fields in your dataset if you are publishing to other repositories, just know that they will not be included in your OBIS dataset. You may include this information either by putting it in the `dynamicProperties` field in JSON format, or putting the information into the [eMoF](format_emof.html). Alternatively, you may have fields that you do not wish to be published and that do not correspond to one of these terms (e.g. personal notes). This is okay - if they are not mapped to one of the terms, that column in your dataset will not be published. diff --git a/common_formatissues.md b/common_formatissues.md index 29363b1..b951c71 100644 --- a/common_formatissues.md +++ b/common_formatissues.md @@ -20,19 +20,19 @@ To resolve missing fields [marked as required](checklist.html) by OBIS, there ar - **`eventDate`** -Ensure your eventDate is specified for each event, formatted according to [ISO 8601 standards](https://en.wikipedia.org/wiki/ISO_8601) (e.g., YYYY-MM-DD). We have developed [step by step guidelines](common_formatissues#temporal-dates-and-times.html) to help you format contemporary dates and durations into ISO formatting. If your date falls outside the range of acceptable dates - i.e., historical or geological data occurring before 1583 - please follow recommendations for [historical data](common_formatissues#historical-data.html). +Ensure your eventDate is specified for each event, formatted according to [ISO 8601 standards](https://en.wikipedia.org/wiki/ISO_8601) (e.g., YYYY-MM-DD). We have developed [step by step guidelines](common_formatissues.html#temporal-dates-and-times) to help you format contemporary dates and durations into ISO formatting. If your date falls outside the range of acceptable dates - i.e., historical or geological data occurring before 1583 - please follow recommendations for [historical data](common_formatissues.html#historical-data). For any eventDate that is inferred from literature, you should document the original date in the `verbatimEventDate` field. - **`decimalLongitude`** and **`decimalLatitude`** -First, if you have coordinate data, make sure they are [converted into decimal degrees](common_formatissues#converting-coordinates.html). If you do not have specific coordinate data then you must approximate the coordinates based on locality name. You can use the [Marine Regions gazetteer](https://www.marineregions.org/gazetteer.php?p=search) to search for your region of interest and obtain midpoint coordinates. Guidelines for using this tool and for dealing with uncertain geolocations can be found [here](common_formatissues#geographical-formats.html). You will have to make some comments in the `georeferenceRemarks` field if you are estimating coordinates. +First, if you have coordinate data, make sure they are [converted into decimal degrees](common_formatissues.html#converting-coordinates). If you do not have specific coordinate data then you must approximate the coordinates based on locality name. You can use the [Marine Regions gazetteer](https://www.marineregions.org/gazetteer.php?p=search) to search for your region of interest and obtain midpoint coordinates. Guidelines for using this tool and for dealing with uncertain geolocations can be found [here](common_formatissues.html#geographical-formats). You will have to make some comments in the `georeferenceRemarks` field if you are estimating coordinates. - **`scientificName`** This field should contain only the **originally documented** scientific name down to the lowest possible taxon rank, even if there are misspellings or if it is a current synonym. Class, or even Kingdom levels are accepted if more specific taxonomic levels are unknown. Comments about misspellings, etc. can be documented in the `taxonRemarks` field. Note that there may be special cases for eDNA and DNA derived data, see [specific guidelines](dna_data.html) for these cases. -You may encounter challenges filling this field if the species name is based on description or if its taxonomy was uncertain at time of sampling. For such uncertain taxonomy situations, see our guidelines [here](common_qc#uncertain-taxaonomic-information.html). +You may encounter challenges filling this field if the species name is based on description or if its taxonomy was uncertain at time of sampling. For such uncertain taxonomy situations, see our guidelines [here](common_qc.html#uncertain-taxaonomic-information). - **`scientificNameID`** @@ -61,7 +61,7 @@ For specifics on when to use each of these and which other fields should be popu ### Temporal: Dates and times -The date and time at which an event took place or an occurrence was recorded goes in `eventDate`. This field uses the [ISO 8601 standard](https://en.wikipedia.org/wiki/ISO_8601). OBIS recommends using the extended ISO 8601 format with hyphens. Note that all dates in OBIS become translated to UTC during the [quality control process implemented by OBIS](dataquality.html). Formatting your dates correctly ensures there will be no errors during this process. +The date and time at which an event took place or an occurrence was recorded goes in `eventDate`. This field uses the [ISO 8601 standard](https://en.wikipedia.org/wiki/ISO_8601). OBIS recommends using the extended ISO 8601 format with hyphens. Note that all dates in OBIS become translated to UTC during the [quality control process implemented by OBIS](https://github.com/iobis/obis-qc). Formatting your dates correctly ensures there will be no errors during this process. ISO 8601 dates can represent moments in time at different resolutions, as well as time intervals, which use / as a separator. Date and times are separated by `T`. Timezones can be indicated at the end by using + or - the number of hours offset from UTC. If no timezone is indicated, then the time is assumed to be local time. When a date/time is recorded in UTC, a Z should be added at the end. Times must be written in the 24-hour clock system. If you do not know the time, you do not have to provide it. Please do not indicate unknown times as “00:00” as this indicates midnight. @@ -171,7 +171,7 @@ All coordinates provided in the `decimalLatitude` or `decimalLongitude` fields i ![Screenshot of how to use the OBIS coordinate converter](images/coordinate_conversion.png){width=40%} -The [Map Tool tutorial](access#mapper.html) also reviews use of the coordinate conversion tool. +The [Map Tool tutorial](access.html#mapper) also reviews use of the coordinate conversion tool. If your coordinates are in UTMs, then coordinate conversion can be a bit trickier. We suggest using the following [conversion tool](http://rcn.montana.edu/resources/Converter.aspx) to convert from UTM to decimal degrees. Note it is very important to ensure you have the correct UTM zone, otherwise the coordinate conversion will be incorrect. You can use this [ArcGIS map tool](https://www.arcgis.com/apps/View/index.html?appid=7fa64a25efd0420896c3336dc2238475) to visually confirm UTM zones. diff --git a/common_qc.md b/common_qc.md index 762647e..4eaa5e4 100644 --- a/common_qc.md +++ b/common_qc.md @@ -57,10 +57,11 @@ Please see the [video tutorial](access.html#mapper) on how to use our Map tool. For both the [Getty thesaurus](https://www.getty.edu/research/tools/vocabularies/tgn/) and [Google Maps](https://www.google.com/maps/) you can simply search the name of a locality, for example the Cook Strait in New Zealand. The search result on the Getty thesaurus will bring you to a page where you can obtain `decimalLatitude` and `decimalLongitdue`. -![Screenshot of Cook Strait page on the Getty Thersaurus](images/getty-thesaurus-NZexample.png){width=50%} +![Screenshot of Cook Strait page on the Getty Thersaurus](images/getty-thesaurus-NZexample.png){width=60%} For Google Maps, the coordinates can be found in the url after searching. -![Screenshot of Google Maps showing where coordinates can be foun din the URL](images/google-maps-coordinates.png){width=50%} + +![Screenshot of Google Maps showing where coordinates can be foun din the URL](images/google-maps-coordinates.png){width=60%} ### How to use Marine Regions Gazetteer tool @@ -69,15 +70,15 @@ Marine Regions offers a marine gazetteer search engine to obtain geographic info For this example we will search by geographic name for the Bay of Fundy. ![Marine regions gazetteer search](images/marinegazeeteer-search.png){width=50%} -![Marine regions gazetteer search](images/marinegazeeteer-bayfundy.png){width=40%} +![Marine regions gazetteer search](images/marinegazeeteer-bayfundy.png){width=60%} Our search returned 5 results from different sources (indicated in brackets). So how do we select the correct one? We can notice right away that the second result, from SeaVox SeaArea, has a preferred alternative, which when you click on the link brings you to the IHO Sea Area description for Bay of Fundy. So already we can likely drop SeaVox as a potential candidate. A good next step may be to compare the geographical extent for each to ensure it covers the desired area. If you are uncertain about exactly where your locality is, it may be better to be safe and choose a wider geographic region. Let’s compare the maps for all 5 results: -![Bay of Fundy source IHO](images/marinegazeeteer-bayfundy-1-IHO.png){width=40%} -![Bay of Fundy source SeaVoX](images/marinegazeeteer-bayfundy-2-SeaVoX.png){width=40%} -![Canadian part of Bay of Fundy source Marine Region](images/marinegazeeteer-bayfundy-3-MarRegion.png) -![Bay of Fundy source MEOW](images/marinegazeeteer-bayfundy-4-MEOW.png){width=40%} -![United States part of Bay of Fundy source Marine Region](images/marinegazeeteer-bayfundy-5-MarRegion.png){width=40%} +![Bay of Fundy source IHO](images/marinegazeeteer-bayfundy-1-IHO.png){width=450%} +![Bay of Fundy source SeaVoX](images/marinegazeeteer-bayfundy-2-SeaVoX.png){width=50%} +![Canadian part of Bay of Fundy source Marine Region](images/marinegazeeteer-bayfundy-3-MarRegion.png){width=50%} +![Bay of Fundy source MEOW](images/marinegazeeteer-bayfundy-4-MEOW.png){width=50%} +![United States part of Bay of Fundy source Marine Region](images/marinegazeeteer-bayfundy-5-MarRegion.png){width=50%} Notice that no region has the exact same geographic extent. Let’s select the IHO Bay of Fundy locality (the first search result) to ensure we are covering the entire area of the Bay of Fundy, but not the Gulf. Inspecting the rest of the page, there is a lot of other useful information we can use. We can populate the following OBIS fields for our dataset, copying the information outlined in the red boxes: @@ -101,7 +102,6 @@ Below is a table summarizing the different DwC terms you can obtain from the OBI |--|--|--|--| | decimalLatitude | Latitude | Latitude | | | decimalLongitude | Longitude | Longitude | | -| maximumDepthInMeters | Depth | | No minimum depth is provided from either Mapper or Marine Regions | | locationID | | MRGID | | | coordinateUncertaintyInMeters | radius | precision (not always available) | | | footprintWKT | WKT | | | @@ -130,7 +130,9 @@ There is a new Darwin Core term [`verbatimIdentification`](https://dwc.tdwg.org/ The use and definitions for additional Open Nomenclature (ON) signs (`identificationQualifier`) can be found in [Open Nomenclature in the biodiversity era](https://doi.org/10.1111/2041-210X.12594), which provides examples for using the main Open Nomenclature qualifiers associated with physical specimens (Figure 1). Whereas the publication [Recommendations for the Standardisation of Open Taxonomic Nomenclature for Image-Based Identifications](https://www.frontiersin.org/articles/10.3389/fmars.2021.620702/full) provides examples and definitions for identificationQualifiers for non-physical specimens (image-based) (Figure 2). -![Figure 1. Flow diagram with the main Open Nomenclature qualifiers associated with physical specimens. The degree of confidence in the correct identifier increases from the top down. More info and figure copied from [Open Nomenclature in the biodiversity era](https://doi.org/10.1111/2041-210X.12594).](images/fig1-openNomenclature.png){width=50%} ![Figure 2: Flow diagram with the main Open Nomenclature qualifiers for the identification of specimens from images (non-physical, image-based) . More information and figure copied from [Recommendations for the Standardisation of Open Taxonomic Nomenclature for Image-Based Identifications](https://www.frontiersin.org/articles/10.3389/fmars.2021.620702/full)](images/fig2-flowDiagram.jpg){width=50%} +![Figure 1. Flow diagram with the main Open Nomenclature qualifiers associated with physical specimens. The degree of confidence in the correct identifier increases from the top down. More info and figure copied from [Open Nomenclature in the biodiversity era](https://doi.org/10.1111/2041-210X.12594).](images/fig1-openNomenclature.png){width=60%} + +![Figure 2: Flow diagram with the main Open Nomenclature qualifiers for the identification of specimens from images (non-physical, image-based) . More information and figure copied from [Recommendations for the Standardisation of Open Taxonomic Nomenclature for Image-Based Identifications](https://www.frontiersin.org/articles/10.3389/fmars.2021.620702/full)](images/fig2-flowDiagram.jpg){width=60%} #### Changes in taxonomic classification @@ -168,15 +170,19 @@ If you are given an error that your taxon is not marine, please confirm first wh Otherwise, records marked as non-marine will be dropped from the published dataset, and this will be flagged in the data quality associated with your dataset. Let’s consider an example within [this dataset](https://obis.org/dataset/9fbaeb21-a0dc-4a29-8237-1cd7ada266e0) on benthic macroalgae. Inspecting the data quality report we can see there are three dropped records due to species not being marine. + ![Dropped records from a benthic macroalgae dataset](images/dropped-records1.png){width=40%} Clicking on the dropped records we can see which three species were dropped. By scrolling to the right of the table, we can see these records have two quality flags: NO_DEPTH and NOT_MARINE. + ![Flags specifying why certain records were dropped](images/dropped-records2.png){width=40%} Let’s take a look at the first species, Pseudochantransia venezuelensis. When we search for this species on [WoRMS](https://www.marinespecies.org/aphia.php?p=taxdetails&id=836900) we can see that the species is marked as freshwater. -![ ](images/nonmarine-spp-worms.png){width=40%} + +![ ](images/nonmarine-spp-worms.png){width=60%} Cross-referencing with IRMNG, if we search for the genus-species, the species is not even found, an indication that it is not in the database (and also why it can be good to check multiple sources). Searching for just the genus, we can see that marine and brackish are stricken out, indicating the species is not marine. -![ ](images/nonmarine-spp-irmng.png){width=40%} + +![ ](images/nonmarine-spp-irmng.png){width=60%} If you have species that are marked as non-marine in these registers but are either supposed to be marine, or were found in a marine environment, then you should contact WoRMS to discuss adding it to the register. For additions and/or edits to environmental or distribution records of a species, contact the WoRMS Data Management Team at info@marinespecies.org with your request along with your record or publication substantiating the addition/change. diff --git a/contribute.md b/contribute.md index 037c9ea..36486f8 100644 --- a/contribute.md +++ b/contribute.md @@ -21,7 +21,7 @@ Since 2000, OBIS has accepted, curated and published marine biodiversity data ob So if you have any of these types of marine data linked to your occurrence data and also want to contribute to OBIS - great! OBIS accepts data from any organization, consortium, project or individual who wants to contribute data. OBIS Data Sources are the authors, editors, and/or organisations that have published one or more datasets through OBIS. They remain the owners or custodians of the data, not OBIS! -OBIS harvests and publishes data from recognized IPTs from OBIS nodes or GBIF publishers. If you own data or have the right to publish data in OBIS, you can contact the [OBIS secretariat or one of the OBIS nodes](https://obis.org/contact/), or additionally a [GBIF publisher](LINK). Your organization or programme can also [become an OBIS node](nodes.html). An OBIS node usually publishes data from multiple data holders, effectively being a node in a network of data providers. So you may have to first find a [relevant node](https://obis.org/contact/) before you get your data ready to publish. +OBIS harvests and publishes data from recognized IPTs from OBIS nodes or GBIF publishers. If you own data or have the right to publish data in OBIS, you can contact the [OBIS secretariat or one of the OBIS nodes](https://obis.org/contact/), or additionally a GBIF publisher. Your organization or programme can also [become an OBIS node](nodes.html). An OBIS node usually publishes data from multiple data holders, effectively being a node in a network of data providers. So you may have to first find a [relevant node](https://obis.org/contact/) before you get your data ready to publish. To publish a dataset to OBIS, there are **five** main steps you must go through. @@ -72,8 +72,8 @@ To accommodate sensitivity but still be able to contribute to OBIS, we suggest: * [Generalizing location](common_qc.html#uncertain-geolocation) information by: Obtaining regional coordinates using [MarineRegions](http://www.marineregions.org/gazetteer.php?p=search), [Getty Thesaurus of Geographic Names](http://www.getty.edu/research/tools/vocabularies/tgn/), or [Google Maps](http://maps.google.com/) -* Using the [OBIS Map tool](https://obis.org/maptool/) to generate a polygon area with a Well-Known Text (WKT) representation of the geometry to paste into the `footprintWKT` field. [Maptool tutorial](LINK) -Delay timing of publication (e.g., to accommodate mobile species) +* Using the [OBIS Map tool](https://obis.org/maptool/) to generate a polygon area with a Well-Known Text (WKT) representation of the geometry to paste into the `footprintWKT` field. +* Delay timing of publication (e.g., to accommodate mobile species) * [Submit your dataset, but mark it as private in the IPT](ipt.html) so it is not published right away (i.e., until you set it as public). Alternatively, you can set a password on your dataset in order to share with specific individuals. Note that setting passwords will require some coordination with the IPT manager. By submitting your data to an IPT but not immediately publishing it, you can ensure that the dataset will be in a place to be incorporated at a later date when it is ready to be made public. This not only saves time and helps retain details while relatively fresh in your mind, but also ensures the dataset is still ready to be mobilized in case jobs are changed at a later date. GBIF has created the following [Best Practices for Generalizing Sensitive data](https://docs.gbif.org/sensitive-species-best-practices/master/en/) which can provide you with additional guidance. diff --git a/darwin_core.md b/darwin_core.md index 5434fe9..37f02e3 100644 --- a/darwin_core.md +++ b/darwin_core.md @@ -212,7 +212,7 @@ _Data from [A summary of benthic studies in the sluice dock of Ostend during 197 `basisOfRecord` (required term) specifies the nature of the record, i.e. whether the occurrence record is based on a stored specimen or an observation. In case the specimen is collected and stored in a collection (e.g. at a museum, university, research institute), the options are `PreservedSpecimen` (e.g. preserved in ethanol, tissue etc.), `FossilSpecimen` (fossil, which allows OBIS to make the distinction between the date of collection and the time period the specimen was assumed alive) or `LivingSpecimen` (an intentionally kept/cultivated living specimen e.g. in an aquarium or culture collection). In case no specimen is deposited, the basis of record is either `HumanObservation` (e.g bird sighting, benthic sample but specimens were discarded after counting), or `MachineObservation` (e.g. for occurrences based on automated sensors such as DNA sequencers, image recognition etc). -When the basisOfRecord is a _preservedSpecimen_, _LivingSpecimen_ or _FossilSpecimen_ please also add the `institutionCode`, `collectionCode` and `catalogNumber`, which will enable people to visit the collection and re-examine the material. Sometimes, for example in case of living specimens, a dataset can contain records pointing to the origin, the in-situ sampling position as well as a record referring to the ex-situ collection. In this case please add the event type information in `type` (see [OBIS manual: event](darwin_core#event.html)). +When the basisOfRecord is a _preservedSpecimen_, _LivingSpecimen_ or _FossilSpecimen_ please also add the `institutionCode`, `collectionCode` and `catalogNumber`, which will enable people to visit the collection and re-examine the material. Sometimes, for example in case of living specimens, a dataset can contain records pointing to the origin, the in-situ sampling position as well as a record referring to the ex-situ collection. In this case please add the event type information in `type` (see [OBIS manual: event](darwin_core.html#event)). `institutionCode` identifies the custodian institute (often by acronym), `collectionCode` identifies the collection or dataset within that institute. Collections cannot belong to multiple institutes, so all records within a collection should have the same `institutionCode`. The `collectionID` is an identifier for the record within the dataset or collection. @@ -263,7 +263,7 @@ _Data from [Adriatic and Ionian Sea mega-fauna monitoring employing ferry as pla ##### Event -`eventID` is an identifier for the sampling or observation event. `parentEventID` is an identifier for a parent event, which is composed of one or more sub-sampling (child) events (eventIDs). See [identifiers](identifiers#eventid.html) for details on how these terms can be constructed. +`eventID` is an identifier for the sampling or observation event. `parentEventID` is an identifier for a parent event, which is composed of one or more sub-sampling (child) events (eventIDs). See [identifiers](identifiers.html#eventid) for details on how these terms can be constructed. `habitat` is a category or description of the habitat in which the Event occurred (e.g. benthos, seamount, hydrothermal vent, seagrass, rocky shore, intertidal, ship wreck etc.) @@ -273,7 +273,7 @@ The date and time at which an occurrence was recorded goes in `eventDate`. This -More specific guidelines on formatting dates and times can be found in the [Common Data formatting issues page](common_formatissues#temporal-dates-and-times) +More specific guidelines on formatting dates and times can be found in the [Common Data formatting issues page](common_formatissues.html#temporal-dates-and-times) ##### Sampling diff --git a/data_format.md b/data_format.md index ce1a238..e576d21 100644 --- a/data_format.md +++ b/data_format.md @@ -56,7 +56,7 @@ DNA derived data are increasingly being used to document taxon occurrences. To e ##### A special case: habitat types -Including information on habitats (biological community, biotope, or habitat type) is possible and encouraged with the use of Event Core. However, beware the unconstrained nature of the terms `measurementTypeID`, `measurementValueID`, and `measurementUnitID` which can lead to inconsistently documented habitat measurements within the Darwin Core Archive standard. To ensure this data is more easily discoverable, understood or usable, refer to [Examples: habitat data](other_data_types#habitat-data.html) and/or [Duncan et al. (2021)](https://www.emodnet-seabedhabitats.eu/resources/documents-and-outreach/#h3298bcd0a15741a8a0ac1c8b4576f7c5) for use case examples and more details. +Including information on habitats (biological community, biotope, or habitat type) is possible and encouraged with the use of Event Core. However, beware the unconstrained nature of the terms `measurementTypeID`, `measurementValueID`, and `measurementUnitID` which can lead to inconsistently documented habitat measurements within the Darwin Core Archive standard. To ensure this data is more easily discoverable, understood or usable, refer to [Examples: habitat data](other_data_types.html#habitat-data) and/or [Duncan et al. (2021)](https://www.emodnet-seabedhabitats.eu/resources/documents-and-outreach/#h3298bcd0a15741a8a0ac1c8b4576f7c5) for use case examples and more details. ##### Recommended reading diff --git a/data_sharing.md b/data_sharing.md index 1eaa923..0ecec4b 100644 --- a/data_sharing.md +++ b/data_sharing.md @@ -8,13 +8,13 @@ As the IPT administrator, you must enable the capacity for users to reserve DOIs Once this has been configured, a data provider or admin can easily reserve a DOI for a dataset. First log in to the IPT, navigate to the Manage Resources tab, then select the dataset for which you wish to reserve a DOI. On the overview page for the dataset, scroll to the Publication section, click the three vertical dots and select “Reserve DOI”. -![Screenshot indicating how to reserve a DOI for your dataset](images/ipt-doi.png){width=50%} +![Screenshot indicating how to reserve a DOI for your dataset](images/ipt-doi.png){width=60%} ### User tracking OBIS tracks the number of times your dataset is downloaded. This information is available on your dataset’s page under the Statistics box. -![Example screenshot of how dataset downloads can be tracked](images/data-tracking.png){width=50%} +![Example screenshot of how dataset downloads can be tracked](images/data-tracking.png){width=60%} ## Update your data in OBIS diff --git a/data_standards.md b/data_standards.md index d1d75ce..55d1775 100644 --- a/data_standards.md +++ b/data_standards.md @@ -11,23 +11,23 @@ The basic data life cycle for contributions to OBIS can be broken down into six 5. Data access (downloading) 6. Data visualization -Each of these phases are outlined in this manual and are composed of a number of steps which are covered in the relevant sections. +Each of these phases are outlined in this manual and are composed of a number of steps which are covered in the relevant sections. -After you have decided on your [data structure](formatting.html) and have moved to the Data Formatting stage, you must first [match](name_matching.html) the taxa in your dataset to a registered list. In formatting your dataset you will ensure the [required OBIS terms](checklist.html) and [identifiers](identifiers.html) are mapped correctly to your data fields and records. +After you have decided on your [data structure](formatting.html) and have moved to the Data Formatting stage, you must first [match](name_matching.html) the taxa in your dataset to a registered list. In formatting your dataset you will ensure the [required OBIS terms](checklist.html) and [identifiers](identifiers.html) are mapped correctly to your data fields and records. -Depending on your data structure, you will then format data into a [DwC-A](data_format.html) format with the appropriate Core table ([Event](format_event) or [Occurrence](format_occurrence.html))) with any applicable extension tables. Any biotic or abiotic measurements will be moved into the [extendedMeasurementOrFact table](link #22+). Before proceeding to the [publishing](data_publication.html) stage, there are a number of [quality control](dataquality.html) steps to complete. +Depending on your data structure, you will then format data into a [DwC-A](data_format.html) format with the appropriate Core table ([Event](format_event.html) or [Occurrence](format_occurrence.html)) with any applicable extension tables. Any biotic or abiotic measurements will be moved into the [extendedMeasurementOrFact table](link #22+). Before proceeding to the [publishing](data_publication.html) stage, there are a number of [quality control](dataquality.html) steps to complete. Once your data has been published, you and others can [access](access.html) datasets through various avenues and it becomes part of OBIS’ global database! -This may seem like a daunting process at first glance, but this manual will walk you through each step, and the OBIS community is full of [helpful resources](gethelp.html). Throughout the manual you will find tutorials and tools to guide you from start to finish through the OBIS data life cycle. +This may seem like a daunting process at first glance, but this manual will walk you through each step, and the OBIS community is full of [helpful resources](gethelp.html). Throughout the manual you will find tutorials and tools to guide you from start to finish through the OBIS data life cycle.
#### Who is responsible for each phase? -Phases 1 through 3 are the responsibilities of the data provider, while Phases 3 and 4 are shared between the data provider and the node manager. Data users are involved in Phases 5 and 6. -The OBIS Secretariat is responsible for data processing and harvesting published resources. +Phases 1 through 3 are the responsibilities of the data provider, while Phases 3 and 4 are shared between the data provider and the node manager. Data users are involved in Phases 5 and 6. +The OBIS Secretariat is responsible for data processing and harvesting published resources. ## Biodiversity data standards @@ -39,4 +39,4 @@ From the very beginning, OBIS has championed the use of international standards The following pages of this manual review each of these in turn. We show you how to apply these standards to format your data in the [Data Formatting](formatting.html) section. -We also provide some [dataset examples](examples.html) for your reference. \ No newline at end of file +We also provide some [dataset examples](examples.html) for your reference. diff --git a/dna_data.md b/dna_data.md index 949eaaa..88b6b98 100644 --- a/dna_data.md +++ b/dna_data.md @@ -2,9 +2,9 @@ **Contents:** -- [Introduction](dna_data#introduction.html) -- [eDNA & DNA Derived use cases](dna_data#edna--dna-derived-data-example.html) -- [How to find genetic data in OBIS](dna_data#how-to-find-genetic-data-in-obis.html) +- [Introduction](dna_data.html#introduction) +- [eDNA & DNA Derived use cases](dna_data.html#edna--dna-derived-data-example) +- [How to find genetic data in OBIS](dna_data.html#how-to-find-genetic-data-in-obis) #### Introduction @@ -18,7 +18,7 @@ To ensure DNA data are useful to the broadest possible community, GBIF published 4. Name references 5. Metadata only -For a guide and decision tree on determining which category your data falls into, see the [Data packaging and mapping](https://docs.gbif.org/publishing-dna-derived-data/1.0/en/#data-packaging-and-mapping) section of the GBIF guide. Refer to the [examples below](dna_data#edna--dna-derived-data.html) for use case examples of eDNA and DNA derived data (Category 1). +For a guide and decision tree on determining which category your data falls into, see the [Data packaging and mapping](https://docs.gbif.org/publishing-dna-derived-data/1.0/en/#data-packaging-and-mapping) section of the GBIF guide. Refer to the [examples below](dna_data.html#edna--dna-derived-data) for use case examples of eDNA and DNA derived data (Category 1). > Currently, genetic data **must** be published with Occurrence core, not Event core. eDNA and DNA derived data are then linked to the Occurrence core data table with the use of `occurrenceID` and/or `eventID`. See below for further guidance on compiling genetic data. @@ -59,9 +59,9 @@ Then, you will need to format the DNADerivedData extension. The following (free- - DNA Derived | DwC: thresholdQuantificationCycle - DNA Derived | DwC: baselineValue -For a complete list of terms you can map to, see [the DwC DNA Derived Data extension page](http://rs.gbif.org/extension/gbif/1.0/dna_derived_data_2021-07-05.xml). See the [examples below](dna_data#edna--dna-derived-data.html) for use case examples. The Marine Biological Data Mobilization Workshop also has a [tutorial](https://ioos.github.io/bio_mobilization_workshop/edna-extension/#dna-derived-extension) for this type of data. +For a complete list of terms you can map to, see [the DwC DNA Derived Data extension page](http://rs.gbif.org/extension/gbif/1.0/dna_derived_data_2021-07-05.xml). See the [examples below](dna_data.html#edna--dna-derived-data) for use case examples. The Marine Biological Data Mobilization Workshop also has a [tutorial](https://ioos.github.io/bio_mobilization_workshop/edna-extension/#dna-derived-extension) for this type of data. -When your data tables are formatted and you are ready to publish it on the IPT, it will follow the same process for [publishing on an IPT](data_publication.html). You will upload your source files, and add the Occurrence core Darwin Core mappings, and then the DNA Derived Data Darwin Core mappings. However the extension must first be [installed by the IPT administrator](data_publication#ipt-administration.html) (often the node manager). Once the extension is installed, you can add the Darwin Core DNA Derived Data mapping for that file. +When your data tables are formatted and you are ready to publish it on the IPT, it will follow the same process for [publishing on an IPT](data_publication.html). You will upload your source files, and add the Occurrence core Darwin Core mappings, and then the DNA Derived Data Darwin Core mappings. However the extension must first be [installed by the IPT administrator](data_publication.html#ipt-administration) (often the node manager). Once the extension is installed, you can add the Darwin Core DNA Derived Data mapping for that file. ##### OBIS Bioinformatics Pipline @@ -71,14 +71,14 @@ Broadly speaking, it creates a framework that receives raw sequence data from eD OBIS is developing guidelines and pipelines to accept other data types, such as: -- [Acoustic](other_data_types#multimedia-data.html) -- [Imaging](other_data_types#multimedia-data.html) -- [Tracking](other_data_types#tracking-data.html) -- [Habitat](other_data_types#habitat-data.html) +- [Acoustic](other_data_types.html#multimedia-data) +- [Imaging](other_data_types.html#multimedia-data) +- [Tracking](other_data_types.html#tracking-data) +- [Habitat](other_data_types.html#habitat-data) #### eDNA & DNA derived data example -The following example use cases draw on both the [GBIF guide](https://docs.gbif-uat.org/publishing-dna-derived-data/1.0/en/) and the [DNA derived data extension](https://rs.gbif-uat.org/extensions.html#http) to illustrate how to incorporate a DNA derived data extension file into a Darwin Core archive. Note: for the purposes of this section, only required Occurrence core terms are shown, in addition to all eDNA & DNA specific terms. For additional Occurrence core terms, refer to [Occurrence](darwin_core#occurrence.html). +The following example use cases draw on both the [GBIF guide](https://docs.gbif-uat.org/publishing-dna-derived-data/1.0/en/) and the [DNA derived data extension](https://rs.gbif-uat.org/extensions.html#http) to illustrate how to incorporate a DNA derived data extension file into a Darwin Core archive. Note: for the purposes of this section, only required Occurrence core terms are shown, in addition to all eDNA & DNA specific terms. For additional Occurrence core terms, refer to [Occurrence](darwin_core.html#occurrence). ##### eDNA data from Monterey Bay, California diff --git a/eml.md b/eml.md index 375c9ce..d3c1bcf 100644 --- a/eml.md +++ b/eml.md @@ -223,7 +223,7 @@ The information given in this section can also help the OBIS node manager in geo If the dataset covers multiple areas (e.g. samples from the North Sea and the Mediterranean Sea), then this should clearly be mentioned in the `geographicDescription` field. Note that the IPT only allows one bounding box, and you have to uncheck the “Set global coverage” box to change box bounds. -![Screenshot of the Geographical Coverage section of the metadata, emphasizing how to change the bounds of the coverage box in the map.](images/ipt-ss13-meta-geo.png){width=50%} +![Screenshot of the Geographical Coverage section of the metadata, emphasizing how to change the bounds of the coverage box in the map.](images/ipt-ss13-meta-geo.png){width=60%} ###### Taxonomic Coverage @@ -234,7 +234,7 @@ This section can capture two things: > _Note:_ OBIS also recommends to add information on the (higher) taxonomic groups in the (descriptive) dataset title and abstract. -![Example of the Taxonomic Coverage section of the metadata](images/ipt-ss14-meta-taxa.png){width=50%} +![Example of the Taxonomic Coverage section of the metadata](images/ipt-ss14-meta-taxa.png){width=70%} ###### Temporal Coverage @@ -242,14 +242,15 @@ The temporal coverage will be a date range, which can easily be documented. If i You can also document the Formation Period or the Living Time Period in this section for specimens that may not have been alive during the collection period, or to indicate the time during which the collection occurred. -![Example of the Temporal Coverage section of the metadata](images/ipt-ss15-meta-time.png){width=50%} +![Example of the Temporal Coverage section of the metadata](images/ipt-ss15-meta-time.png){width=70%} ##### Keywords Relevant keywords facilitate the discovery of a dataset. An indication of the represented functional groups can help in a general search (e.g. plankton, benthos, zooplankton, phytoplankton, macrobenthos, meiobenthos …). Assigned keywords can be related to taxonomy, habitat, geography or relevant keywords extracted from thesauri such as the [ASFA thesaurus](https://vocabularyserver.com/asfa/), the [CAB thesaurus](http://www.cabi.org/cabthesaurus/) or [GCMD keywords](https://www.earthdata.nasa.gov/learn/find-data/idn/gcmd-keywords). As taxonomy and geography are already covered in previous sections, there is no need to repeat related keywords here. Please consult your data provider which (relevant) keywords can be assigned. -![Example of the Keywords section of the metadata, showing input for a marine fishes dataset](images/ipt-ss16-meta-keyword.png){width=50%} + +![Example of the Keywords section of the metadata, showing input for a marine fishes dataset](images/ipt-ss16-meta-keyword.png){width=70%} ##### Project @@ -284,7 +285,7 @@ This overview will contribute to a better understanding of the data as these pub This IPT section should only be filled out if there are specimens held in a museum. If relevant, it is strongly recommended that this information is supplied by the data provider or left blank. The collection name, specimen preservation method, and curatorial units should be provided, as applicable. -![Screenshot of the Collection Data page showing what information can be provided for museum specimens](images/ipt-ss17-meta-collection.png){width=50%} +![Screenshot of the Collection Data page showing what information can be provided for museum specimens](images/ipt-ss17-meta-collection.png){width=70%} ##### External Links diff --git a/format_emof.md b/format_emof.md index e509c93..2ce33ab 100644 --- a/format_emof.md +++ b/format_emof.md @@ -59,7 +59,7 @@ By linking `measurementType` and `measurementValue` with the identifiers `eventI 6. For any other measurements related to occurrences, repeat steps 3-5, pasting additional measurements below the preceding ones * Be sure to copy and paste the associated occurrenceIDs and/or eventIDs for the additional measurements 7. Fill the fields `measurementTypeID`, `measurementUnitID`, and `measurementValueID` with controlled vocabularies that suit your data (see [vocabulary guidelines](vocabulary.html)) -8. Repeat for any measurements in the Event table +8. Repeat steps 3-7 for any measurements in the Event table Note the fields [sampleSizeValue](https://dwc.tdwg.org/terms/#dwc:sampleSizeValue), [samplingEffort](https://dwc.tdwg.org/terms/#dwc:samplingEffort), and [samplingProtocol](https://dwc.tdwg.org/terms/#dwc:samplingProtocol) from the Occurrence table can be documented as separate measurements on different rows in the eMoF table. E.g., `measurementType` = samplingProtocol, `measurementValue` = description of protocol. Any values in [sampleSizeUnit](https://dwc.tdwg.org/terms/#dwc:sampleSizeUnit) fields should be placed in the `measurementUnit` field when transferred to the eMoF. diff --git a/format_event.md b/format_event.md index 3b24b89..a0fbddb 100644 --- a/format_event.md +++ b/format_event.md @@ -37,20 +37,20 @@ Other terms you should consider adding are grouped by their associated Darwin Co Terms related to measurements, either biotic (e.g., sex, lifestage) or abiotic will be included in extendedMeasurementOrFact table _not_ the Event Core or Occurrence extension table. -### Stepwise Guidance to Format Event Table (in Excel) +### Stepwise Guidance to Format Event Table (with spreadsheets) Before proceeding with the below, make sure each record already has an [eventID](identifiers.html). -1. Identify columns in your data that will match with Darwin Core event fields - * Include any relevant abiotic measurements (ENV-DATA) related to sampling events (e.g. sampling protocols). We will add these to the eMoF table later. -2. Copy these columns to a new sheet and name it Event -3. Delete duplicate data so only unique events are left. -4. Identify the hierarchical event structure in your data, if present -5. Add and fill the `parentEventID` and `type` fields as applicable -6. Create new records for parent Events -7. Ensure dates and time are [formatted according to ISO 8601 standards](link #30) in the eventDate field -8. Add any other relevant fields as indicated above - -Watch the video tutorial of this process. +1. Add and fill the `parentEventID` and `eventRemarks` fields as applicable +2. Identify the hierarchical event structure in your data, if present and create new records for parent Events, filling in any relevant fields +3. Identify all columns in your data that will match with Darwin Core Event fields + * Include any relevant abiotic measurements (ENV-DATA) related to sampling events (e.g. sampling protocols). We will add these to the eMoF table later +4. Copy these columns to a new sheet and name it Event +5. Delete duplicate data so only unique events are left +6. Ensure dates and time are [formatted according to ISO 8601 standards](common_formatissues.html#temporal-dates-and-times) in the eventDate field +7. Add any other relevant fields as indicated above +8. Map fields to Darwin Core + +Watch the video tutorial of this process. (Link coming soon) After completing the formatting of your Event Core table, you can next format your extendedMeasurementOrFact table. To format the Occurrence extension table, see the [Occurrence table](format_occurrence.html) section of this manual. diff --git a/format_occurrence.md b/format_occurrence.md index 6eac5a3..1f0f32d 100644 --- a/format_occurrence.md +++ b/format_occurrence.md @@ -37,7 +37,7 @@ Other terms you should consider adding are identified by their associated Darwin Note that any terms related to measurements, either biotic (e.g., sex, lifestage, biomass) or abiotic will be included in extendedMeasurementOrFact table not the Occurrence table. -### Stepwise Guidance to Format an Occurrence Table (in Excel) +### Stepwise Guidance to Format an Occurrence Table (with spreadsheets) Before proceeding with formatting the Occurrence table, be sure you have [completed taxon matching](name_matching.html) to obtain WoRMS LSIDs for the scientificNameID field. @@ -49,6 +49,6 @@ Before proceeding with formatting the Occurrence table, be sure you have [comple 5. Ensure your column names [map to Darwin Core terms](vocabulary.html) * scientificName + scientificNameID -Watch our video tutorial for a demonstration of this procedure. +Watch our video tutorial for a demonstration of this procedure. (coming soon) After formatting your Occurrence Core or Extension table, you can format your extendedMeasurementOrFact table. diff --git a/formatting.md b/formatting.md index b71d695..f11b3b5 100644 --- a/formatting.md +++ b/formatting.md @@ -22,7 +22,7 @@ Occurrence Core datasets describe **observations** and **specimen records** and * **No information** on how the data was sampled or samples were processed is available * No abiotic measurements are taken or provided -* You have [eDNA and DNA derived data](examples.html#edna-dna-derived-data.html) +* You have [eDNA and DNA derived data](examples.html#edna-dna-derived-data) * Biological measurements are made on **individual specimens** (each specimen is a single occurrence record) Occurrence Core is also often the preferred structure for museum collections, citations of occurrences from literature, and sampling activities. @@ -58,7 +58,7 @@ Let us consider a fictional plankton trawl sampling event to demonstrate how ide The GBIF Norwegian Node created the [DwC Excel Template Generator](https://gbif-norway.github.io/dwc-excel-template-generator-js/). This tool will generate four different types of blank Excel spreadsheets: Occurrence Core, MeasurementOrFact, Metadata, and a README. This tool works best if you already know which Darwin Core fields you need, although a default template can be generated. Another tool from Norway is the [Excel to Darwin Core Standard (DwC) Tool](https://zenodo.org/record/6453921#.Y9KsQkHMKmU). This is a macro Excel spreadsheet that helps create templates for Event (aka Sampling-Event) and Occurrence core tables, as well as MeasurementsOrFacts, Extended MeasurementsOrFacts, and Simple Multimedia extensions. -GBIF provides an [Occurrence core template](https://ipt.gbif.org/manual/en/ipt/latest/occurrence-data#templates) and an [Event core template](https://ipt.gbif.org/manual/en/ipt/latest/sampling-event-data#templates). If you use these templates from GBIF, be aware that [GBIF’s required terms are different from OBIS](data_sharing#differences-between-obis-and-gbif-publication-processes.html). +GBIF provides an [Occurrence core template](https://ipt.gbif.org/manual/en/ipt/latest/occurrence-data#templates) and an [Event core template](https://ipt.gbif.org/manual/en/ipt/latest/sampling-event-data#templates). If you use these templates from GBIF, be aware that [GBIF’s required terms are different from OBIS](data_sharing.html#differences-between-obis-and-gbif-publication-processes). There are also some tools that can help you unpivot (or flatten) data tables. These can be used to flatten many columns into one, particularly useful for the [eMoF](format_emof.html) table. diff --git a/gethelp.md b/gethelp.md index 93c67e7..01d54ad 100644 --- a/gethelp.md +++ b/gethelp.md @@ -6,5 +6,5 @@ Finally, you can submit an issue on relevant Github repositories: * [OBIS Manual](https://github.com/iobis/manual/issues) * [OBIS Website](https://github.com/iobis/web) -* [OBIS issues GitHub repo](https://github.com/iobis/obis-issues) +* [OBIS issues GitHub repo](https://github.com/iobis/obis-issues) * [All other OBIS repositories](https://github.com/iobis) diff --git a/identifiers.md b/identifiers.md index 20545db..156956d 100644 --- a/identifiers.md +++ b/identifiers.md @@ -59,7 +59,7 @@ We can see that each record has a similar eventID structure, except for the last `occurrenceID` is an identifier for occurrence records. Each occurrence record must have a unique identifier. Because `occurrenceID` is a required term, you may have to construct a persistent and globally unique identifier for each of your data records if none already exist. -There are no standardized guidelines yet on designing the persistence of this ID, the level of uniqueness (from within a dataset to globally in OBIS), and the precise algorithm and format for generating the ID. But in the absence of a persistent globally unique identifier, one can be constructed by combining the `institutionCode`, the `collectionCode` and the `catalogNumber` (or autonumber in the absence of a catalogNumber). This is similar to how [eventID](identifiers#eventid.html) is constructed. Note that the inclusion of `occurrenceID` is also necessary for datasets in the [OBIS-ENV-DATA](data_format.html#obis-holds-more-than-just-species-occurrences-the-env-data-approach) format. +There are no standardized guidelines yet on designing the persistence of this ID, the level of uniqueness (from within a dataset to globally in OBIS), and the precise algorithm and format for generating the ID. But in the absence of a persistent globally unique identifier, one can be constructed by combining the `institutionCode`, the `collectionCode` and the `catalogNumber` (or autonumber in the absence of a catalogNumber). This is similar to how [eventID](identifiers.html#eventid) is constructed. Note that the inclusion of `occurrenceID` is also necessary for datasets in the [OBIS-ENV-DATA](data_format.html#obis-holds-more-than-just-species-occurrences-the-env-data-approach) format. An important consideration for museum specimens: there is the possibility that the institution a specimen is housed at may change. Therefore you may consider omitting institution identifiers within an occurrenceID, because occurrenceID should **not** change over time. diff --git a/ipt.md b/ipt.md index 4d0a711..78b46c1 100644 --- a/ipt.md +++ b/ipt.md @@ -1,15 +1,15 @@ -## Integrated Publishing Toolkit (IPT) +## IPT: Integrated Publishing Toolkit **Contents:** -- [Introduction](ipt.html#introduction-to-the-ipt) +- [Introduction](#introduction-to-the-ipt) - [How to access the IPT](#how-to-access-the-ipt) -- [Who populates IPTs?](ipt.html#who-populates-the-ipt-with-datasets) -- [Upload data](ipt.html#upload-data) -- [Map to Darwin Core](ipt.html#map-your-data-to-darwin-core) -- [Add metadata](ipt.html#add-metadata) -- [Publish on the IPT](ipt.html#publish-on-the-ipt) -- [Publish your data as a dataset paper](ipt.html#publish-your-metadata-as-a-data-paper) +- [Who populates IPTs?](#who-populates-the-ipt-with-datasets) +- [Upload data](#upload-data) +- [Map to Darwin Core](#map-your-data-to-darwin-core) +- [Add metadata](#add-metadata) +- [Publish on the IPT](#publish-on-the-ipt) +- [Publish your data as a dataset paper](#publish-your-metadata-as-a-data-paper) ### Introduction to the IPT @@ -20,13 +20,13 @@ Before we get into the details for accessing and using the IPT, let’s understa All these components (i.e., core file, extension files, descriptor file, and metadata file) become compressed together (as a .zip file) and comprise the Darwin Core Archive. -![Example showing how Occurrence core, EML, and meta.xml files make up a Darwin Core-Archive file](images/dwca_1.png){width=50%} +![Example showing how Occurrence core, EML, and meta.xml files make up a Darwin Core-Archive file](images/dwca_1.png){width=70%} ### How to access the IPT Once you have determined which [OBIS node IPT](https://ipt.iobis.org/) is suited for your dataset, you can contact your node manager to create an associated account for you. There will be a link on the sign in page that will direct you to the IPT’s administrator to contact them. If your node’s IPT is not listed here, you will have to [contact the node manager](https://obis.org/contact/) to get the link to their IPT. -![Screenshot of IPT login page, highlighting link to IPT admin and the login button](images/ipt-login.png){width=60%} +![Screenshot of IPT login page, highlighting link to IPT admin and the login button](images/ipt-login.png){width=70%} If you are an IPT admin and want to know how to set up an IPT yourself, see the [IPT admin page](ipt_admin.html). @@ -48,7 +48,7 @@ Desmet, P. & C. Sinou. 2012. 7-step guide to data publication. Canadensys. Metadata and click Edit to open the metadata editor. Any information you provide here will be visible on the resource homepage and bundled together with your data when you publish. -![Screenshot showing where to add or upload metadata](images/ipt-ss12-metadata.png){width=50%} +![Screenshot showing where to add or upload metadata](images/ipt-ss12-metadata.png){width=70%} Follow the [OBIS metadata standards and best practices](eml.html), or check the [IPT manual](https://ipt.gbif.org/manual/en/ipt/latest/manage-resources#metadata) for detailed instructions about the metadata editor. You can also upload a file with metadata information. @@ -154,17 +154,17 @@ Follow the [OBIS metadata standards and best practices](eml.html), or check the With your dataset uploaded, properly mapped to DwC, and all the metadata filled, you can publish your dataset. On your resource overview page, go to the Publication section, click the vertical dots and select Publish. -![Screenshot showing where to manage the publishing of your dataset](images/ipt-ss7-pub.png){width=50%} +![Screenshot showing where to manage the publishing of your dataset](images/ipt-ss7-pub.png){width=70%} The IPT will now generate your data as Darwin Core, and combine the data with the metadata to package it as a standardized zip-file called a “Darwin Core Archive”. See the IPT manual for more details. > **Note:** Hitting the "publish" button does not mean that your dataset is available to everyone, it is still private, with access limited to the resource managers. It will only be publicly available when you have changed Visibility to Public. You can choose to do this immediately or at a set date. -![Screenshot showing how to change the visibility of your dataset](images/ipt-ss9-vis.png){width=50%} +![Screenshot showing how to change the visibility of your dataset](images/ipt-ss9-vis.png){width=70%} Your dataset will only be harvested by GBIF when you change Registration to Registered. This step is not needed for OBIS to harvest your datasets. Please do not register your dataset with GBIF if your dataset is already published in GBIF by another publisher. Note that the IPT itself must be [registered with GBIF](ipt_admin.html) in order to publish to GBIF. The node manager can do this. -![Screenshot showing where to register your dataset with GBIF. This is only available if the IPT itself is registered with GBIF as well.](images/ipt-ss8-gbifreg.png){width=50%} +![Screenshot showing where to register your dataset with GBIF. This is only available if the IPT itself is registered with GBIF as well.](images/ipt-ss8-gbifreg.png){width=70%} Back on the resource overview page > Published Release, you can see the details of your first published dataset, including the publication date and the version number. Since your dataset is published privately, the only thing left to do is to click Visibility to Public (see the IPT manual) to make it available to everyone. @@ -184,8 +184,8 @@ The Metadata expressed in the EML Profile standard can also be downloaded as a R To download a dataset from an IPT, simply login, and from the home page (not the Manage Resources tab) search for the dataset in question. You can search for keywords in the Filter box on the right side of the page. -![Overview of home page of an IPT](images/ipt-ss10-download1.png){width=50%} +![Overview of home page of an IPT](images/ipt-ss10-download1.png){width=70%} Once you navigate to the page of a dataset, at the top of the page you will have options to download the whole Darwin Core Archive file, or just the metadata as an EML or RTF file. -![Overview of a dataset page on an IPT, emphasizing where to download the resource](images/ipt-ss11-download2.png){width=50%} +![Overview of a dataset page on an IPT, emphasizing where to download the resource](images/ipt-ss11-download2.png){width=70%} diff --git a/ipt_admin.md b/ipt_admin.md index 2ed61e7..36f24c9 100644 --- a/ipt_admin.md +++ b/ipt_admin.md @@ -32,10 +32,10 @@ As the IPT administrator, you must make sure the IPT is kept up to date, dataset To add or update extensions, navigate to the Darwin Core Types and Extensions page from the Administration menu. To install an extension (e.g., DNA Derived Data), simply scroll down the page and click the `Install` button to the right of the desired extension. -![Screenshot of IPT Admin page](images/iptadmin-installex.png){width=50%} +![Screenshot of IPT Admin page](images/iptadmin-installex.png){width=70%} For extensions already installed, you may notice yellow flags indicating a core or extension is out of date. You can update these easily by clicking the `Update` button. -![Screenshot demonstrating when core or extensions need to be updated](images/iptadmin-core.png){width=50%} +![Screenshot demonstrating when core or extensions need to be updated](images/iptadmin-core.png){width=70%} For a detailed breakdown of administrator options, see the [IPT guide](https://github.com/gbif/ipt/wiki/IPT2ManualAdministration.wiki#administration-menu). diff --git a/lifewatch_qc.md b/lifewatch_qc.md index 86a76f8..3a7ffb8 100644 --- a/lifewatch_qc.md +++ b/lifewatch_qc.md @@ -1,42 +1,42 @@ ### Geographic and data format quality control -These Data validation and QC services are available on the LifeWatch portal at [http://www.lifewatch.be/data-services](http://www.lifewatch.be/data-services). +These Data validation and QC services are available on the LifeWatch portal at [http://www.lifewatch.be/data-services](http://www.lifewatch.be/data-services). #### Geographical service -This service allows to upload a file and to plot the listed coordinates on a map. Using this web service does not require knowledge of GIS. This service allows a visual check of the available locations and makes it possible to easily identify points on land or outside the scope or study area. Geographic data are essential for OBIS and the experience is that a lot of these data is incomplete or contains errors. A visual check of the position of the sampling locations is thus a simple way of filtering out obvious errors and improving the data quality. Latitude and longitude need to be in WGS84, decimal degrees. This format is also necessary for the OBIS Schema and for uploading the dataset to IPT (Darwin Core). +This service allows to upload a file and to plot the listed coordinates on a map. Using this web service does not require knowledge of GIS. This service allows a visual check of the available locations and makes it possible to easily identify points on land or outside the scope or study area. Geographic data are essential for OBIS and the experience is that a lot of these data is incomplete or contains errors. A visual check of the position of the sampling locations is thus a simple way of filtering out obvious errors and improving the data quality. Latitude and longitude need to be in WGS84, decimal degrees. This format is also necessary for the OBIS Schema and for uploading the dataset to IPT (Darwin Core). #### OBIS data format validation -This is the most extensive check currently available and is available for data that are structured according to the OBIS Schema. This validation service checks the following items: +This is the most extensive check currently available and is available for data that are structured according to the OBIS Schema. This validation service checks the following items: -* Are all mandatory fields completed, what are the missing fields? -* Are the coordinates in the correct format (decimal degrees, taking into account the minimum and maximum possible values)? -* Are the sampling points on land or in water? -* Is the information in the date-fields valid (e.g. month between 1-12)? -* Can the taxon name be matched with WoRMS? +* Are all mandatory fields completed, what are the missing fields? +* Are the coordinates in the correct format (decimal degrees, taking into account the minimum and maximum possible values)? +* Are the sampling points on land or in water? +* Is the information in the date-fields valid (e.g. month between 1-12)? +* Can the taxon name be matched with WoRMS? This tool undertakes several actions simultaneously. In a first step, this data service allows you to map your own column headers to the field names used in the OBIS Schema. When you then run the format validation service, the following actions are performed: -* A check of the mandatory fields of the OBIS Scheme. If mandatory fields would be missing, these will be listed separately, so you can complete them. Without these fields, the dataset cannot be accepted by the OBIS node. -* A listing of all the optional fields of the OBIS Scheme that are available in your file. -* Validation of the content of a number of fields: - * Latitude & longitude: - * Are the values inside the world limit? (yes/no); - * Are the values different from zero? (yes/no); - * Are the values situated in the marine environment (sea/ocean) (=prerequisite of a marine dataset)? (yes/no) - * Date-related fields: - * Do the year-month-day fields form a valid date? (yes/no) - * Do the start- and end-date fields form a valid date? (yes/no) -* Scientific name: - * Is the scientific name available in WoRMS? (yes/no) - * When yes: - * Indication whether taxon is marine or not - * Indication whether taxon name is valid or not - * Indication of the taxonomic rank +* A check of the mandatory fields of the OBIS Scheme. If mandatory fields would be missing, these will be listed separately, so you can complete them. Without these fields, the dataset cannot be accepted by the OBIS node. +* A listing of all the optional fields of the OBIS Scheme that are available in your file. +* Validation of the content of a number of fields: +* Latitude & longitude: + * Are the values inside the world limit? (yes/no); + * Are the values different from zero? (yes/no); + * Are the values situated in the marine environment (sea/ocean) (=prerequisite of a marine dataset)? (yes/no) +* Date-related fields: + * Do the year-month-day fields form a valid date? (yes/no) + * Do the start- and end-date fields form a valid date? (yes/no) +* Scientific name: + * Is the scientific name available in WoRMS? (yes/no) + * When yes: + * Indication whether taxon is marine or not + * Indication whether taxon name is valid or not + * Indication of the taxonomic rank After matching with WoRMS, the report gives a brief overview containing: - + * the number of exact matches * the number of fuzzy (=non-exact) matches * the number of non-matches diff --git a/name_matching.md b/name_matching.md index 2aa2424..7ac13d9 100644 --- a/name_matching.md +++ b/name_matching.md @@ -13,13 +13,13 @@ The identifiers (LSID, TSN, ID) from these registers will be used to populate th > **Note** > You should prioritize using LSIDs because they are unique identifiers which indicate the authority the ID comes from. -You can also use the [Interim Register of Marine and Nonmarine Genera (IRMNG)](https://www.irmng.org/aphia.php?p=search) to [distinguish marine genera from freshwater genera](common_qc#non-marine-species.html). +You can also use the [Interim Register of Marine and Nonmarine Genera (IRMNG)](https://www.irmng.org/aphia.php?p=search) to [distinguish marine genera from freshwater genera](common_qc.html#non-marine-species). ### Taxon Matching Workflow The OBIS node managers have agreed to match all the scientific names in their datasets according to the following Name Matching workflow: -![Workflow for matching a list of taxon names to WoRMS](images/WoRMS-taxa-match.png){width=40%} +![Workflow for matching a list of taxon names to WoRMS](images/WoRMS-taxa-match.png){width=60%} #### Step 1: Match with WoRMS @@ -92,7 +92,7 @@ In cases where no match can be found, WoRMS will indicate none. For these cases * Ensure the name was entered correctly and any other information (e.g., authority, year, identification qualifiers) are included in separate columns, not the same cell as the name. * Match with [LifeWatch](https://www.lifewatch.be/data-services/) or another register (see Step 2 below) -* Check that the species [is marine](common_qc#non-marine-species.html) +* Check that the species [is marine](common_qc.html#non-marine-species) If a scientific name does not appear in any register, you should contact the original data provider, where possible, to confirm taxonomic spelling, authority, and obtain any original description documents, then attempt to match again. If even after this there are no matches, you should contact info@marinespecies.org to see if the taxon should be added to the WoRMS register. diff --git a/nodes.md b/nodes.md index 1f95062..5ceeae1 100644 --- a/nodes.md +++ b/nodes.md @@ -1,7 +1,8 @@ ## OBIS nodes + _Note the OBIS node TOR and system architecture is currently under review and will be updated after the 2023 Steering Group meeting. The information below may change._ -OBIS Nodes are either national projects, programmes, institutes, or organizations, National Ocean Data Centers or regional or international projects, programmes and institutions or organizations that carry out data management functions. +OBIS Nodes are either national projects, programmes, institutes, or organizations, National Ocean Data Centers or regional or international projects, programmes and institutions or organizations that carry out data management functions. OBIS nodes are responsible for **representing all aspects of OBIS within a particular region or taxonomic domain**. Additional responsibilities include: @@ -10,11 +11,12 @@ OBIS nodes are responsible for **representing all aspects of OBIS within a parti * Responsibility for **all aspects of the data** * Gaining permission to providing access to the data * Ensuring a certain level of data quality -* Transfer of these datasets to the global OBIS database +* Transfer of these datasets to the global OBIS database * Provide support for the full implementation of OBIS worldwide by serving on the IODE Steering Group for OBIS and any relevant Task Teams or ad hoc project teams * Each node may also maintain a data presence on the Internet representing their specific area of responsibility -### Terms of Reference of OBIS nodes +### Terms of Reference of OBIS nodes + **Data Responsibilities** * Receiving or harvesting marine biodiversity data (and metadata) from national, regional, and international programs, and the scientific community at large, and from Tier III nodes by Tier II nodes, and from Tier II nodes by Tier I nodes @@ -36,7 +38,7 @@ OBIS nodes are responsible for **representing all aspects of OBIS within a parti * Outreach and Capacity Building (i.e., providing expertise, training and support in data management, technologies, standards and best practices) * Engage in stakeholder groups (recommended) -### How to become an OBIS node +### How to become an OBIS node OBIS nodes now operate under the IODE network as either National Oceanographic Data Centres (NODCs) or Associate Data Unites (ADUs). Prospective nodes are required to apply to the IODE for membership. @@ -48,6 +50,7 @@ The procedure to become an OBIS node is as follows: * Email your [application form](http://iode.org/index.php?option=com_oe&task=viewDocumentRecord&docID=11793) to become an IODE Associate Data Unit (ADU), with a specific role as OBIS node. Applications for ADU membership in OBIS shall be reviewed by the IODE Officers in consultation with the IODE Steering Group for OBIS. ### OBIS Node Health Status Check and Transition Strategy + OBIS nodes should operate under IODE as either IODE/ADU or IODE/NODC. As such OBIS nodes are a member of the IODE network. The IODE Steering Group (SG) for OBIS evaluates the health status of OBIS nodes at each annual SG meeting, and considers an OBIS node as **inactive** when it meets any of the following conditions: @@ -70,4 +73,4 @@ In either case, the OBIS Secretariat will inform the OBIS node manager of the SG The IODE Committee is requested to consider the recommendation from the OBIS Steering Group and it may either accept the recommendation or request the inactive OBIS node to submit an action plan (option 1). -When the inactive OBIS node is removed from the IODE network, the SG-OBIS will ask whether another OBIS node is interested in taking over the responsibilities of the removed OBIS node, until a new OBIS node in the country/region is established. \ No newline at end of file +When the inactive OBIS node is removed from the IODE network, the SG-OBIS will ask whether another OBIS node is interested in taking over the responsibilities of the removed OBIS node, until a new OBIS node in the country/region is established. diff --git a/other_resources.md b/other_resources.md index 6e7245a..82ef207 100644 --- a/other_resources.md +++ b/other_resources.md @@ -1,4 +1,5 @@ # (PART\*) Additional Resources {-} + # Other Resources In this section we highlight resources created by collaborators. @@ -15,7 +16,6 @@ This tutorial was created by the [MBON Pole to Pole project](https://marinebon.o This book contains a collection of examples and resources related to mobilizing marine biological data to the [Darwin Core standard](https://dwc.tdwg.org/) for sharing though [OBIS](https://obis.org/). This book has been developed by the [Standardizing Marine Biological Data Working Group (SMBD)](https://github.com/ioos/bio_data_guide/blob/main/README.md). The working group is an open community of practitioners, experts, and scientists looking to learn and educate the community on standardizing and sharing marine biological data. - ## EMODnet Biology - diff --git a/relational_db.md b/relational_db.md index d0108eb..06a44b8 100644 --- a/relational_db.md +++ b/relational_db.md @@ -18,7 +18,7 @@ A fourth table could easily be created to track total school population size thr We elaborate on how this structure is applied within OBIS [here](formatting.html#dataset-structure). -![An example of how a relational database works. Three tables show the (1) student performance (blue table) in (2) different schools (pink table) in a fictional country, and (3) the names of the courses (yellow table). Information between each table is linked by the use of identifiers, indicated by the arrows.](images/RelationalDB.drawio.png){width=60%} +![An example of how a relational database works. Three tables show the (1) student performance (blue table) in (2) different schools (pink table) in a fictional country, and (3) the names of the courses (yellow table). Information between each table is linked by the use of identifiers, indicated by the arrows.](images/RelationalDB.drawio.png){width=70%} Note that when OBIS harvests data, datasets are flattened - i.e., all separate data tables are combined into one. This is the kind of file you will receive when you [download data from OBIS](access.html). The reason for this is that querying relational databases significantly reduces computational time, as opposed to querying a flat database. Relational databases also facilitate requests for subsets that meet particular criteria - e.g., all data from Norway for one species above a certain depth. @@ -30,7 +30,7 @@ For example, let us consider the dates of a ship cruise where a series of bottom Let’s consider another example. If you took one temperature measurement from the water column where you took your sample, each species found in that sample would have the **same** temperature measurement. By linking such measurements to the _event_ instead of each _occurrence_, we are able to reduce the amount of data being repeated. -![Example of how the sample data is distributed to Core and Extension tables, and how these tables are connected in OBIS](images/OBISsampling-example.png){width=60%} +![Example of how the sample data is distributed to Core and Extension tables, and how these tables are connected in OBIS](images/OBISsampling-example.png){width=70%} An advantage of structuring data this way is that if any mistakes are made, you only need to correct it once! So you can see that using relational event structures (when applicable) in combination with extension files can really simplify and reduce the number of times data are repeated.