Skip to content

Commit

Permalink
Changed and improved wording, help links
Browse files Browse the repository at this point in the history
* Changed wording of some recommendations, e.g. required --> strongly recc for scientificNameID
* Changed wording for "certain" taxon identifications to "high/low confidence"
* Added more help links and links to relevant GitHub repos (QC, OBIS issues, datasets needing endorsing)
* Added mention of LSIDs documentation for constructing eventIDs (issue #1 )
* Emphasized contact of WoRMS team in case of non matching names
* Moved vocab guidelines to a non-public page as they are under construction (page=vocab_WIPmaterials)
  • Loading branch information
EliLawrence committed May 31, 2023
1 parent 9b90f70 commit 2008ad9
Show file tree
Hide file tree
Showing 13 changed files with 75 additions and 55 deletions.
2 changes: 2 additions & 0 deletions FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@
* [How are extension tables (e.g. eMOF, occurrence) linked with the core table?](formatting.html#extensions-in-obis)
* [What is the difference between Occurence Core and Event Core?](formatting.html#dataset-structure)
* [What are the responsibilities of node managers?](nodes.html)
* [Where can I find marine datasets linked to the OBIS network by the GBIF registry, that now require endorising?](https://github.com/iobis/obis-network-datasets/)
* [Where can I make suggestions for improvements on this Manual?](https://github.com/iobis/manual)

#### Formatting Data

Expand Down
2 changes: 1 addition & 1 deletion checklist.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Note that when you publish your dataset on the IPT, if you use a term not listed
| occurrenceStatus | required | occurrence | | x | | |
| basisOfRecord | required | record | | x | | x |
| scientificName | required | taxon | | x | | |
| scientificNameID | required | taxon | | x | | |
| scientificNameID | strongly recommended | taxon | | x | | |
| DNA_sequence | strongly recommended | dna | | | | x |
| env_broad_scale | strongly recommended | dna | | | | x |
| env_local scale | recommended | dna | | | | x |
Expand Down
2 changes: 1 addition & 1 deletion common_formatissues.md
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,7 @@ Formatting historical data (data published before 1583 CE) can pose additional c

For records related to fossils or that have other geological contexts, the [Darwin Core class GeologicalContext](https://dwc.tdwg.org/terms/#geologicalcontext) has terms that can be used in the Event core, Occurrence core, or Occurrence table to specify additional information. For such records, `eventDate` would be populated with the date of collection.

For historical data originating from old records, such as ship logs or other archival records, we understand there can be additional issues in interpreting and formatting data according to DwC standards. Often the location, date, species, and other measurements have to be interpreted from textual descriptions or poor quality documents. As these issues can vary wildly, we currently recommend [submitting a Github issue](https://github.com/iobis/obis-issues/issues) to get assistance with an issue.
For historical data originating from old records, such as ship logs or other archival records, we understand there can be additional issues in interpreting and formatting data according to DwC standards. Often the location, date, species, and other measurements have to be interpreted from textual descriptions or poor quality documents. As these issues can vary wildly, we currently recommend [submitting a Github issue](https://github.com/iobis/obis-issues/issues) to get assistance.

More specific guidelines to address historical data complications are under development - stay tuned!

Expand Down
18 changes: 9 additions & 9 deletions common_qc.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,29 +106,29 @@ Below is a table summarizing the different DwC terms you can obtain from the OBI
| coordinateUncertaintyInMeters | radius | precision (not always available) | |
| footprintWKT | WKT | | |

### Uncertain taxonomic information
### Low confidence taxonomic identification

In case of uncertain taxonomic identifications, and/or the scientific name contains qualifiers such as cf., ?, or aff., then you should:
In case of low confidence taxonomic identifications, and/or the scientific name contains qualifiers such as cf., ?, or aff., then you should:

- Put the name of the lowest possible taxon rank referring to the most accurate identification in `scientificName` (usually Genus in these cases)
- Put qualifiers in [`identificationQualifier`](https://dwc.tdwg.org/terms/#dwciri:identificationQualifier) (e.g., cf., aff.)
- Put the name of the lowest possible taxon rank that can be determined with high-confidence in `scientificName` (e.g. the genus)
- Put any text regarding identification with low confidence and/or qualifiers in [`identificationQualifier`](https://dwc.tdwg.org/terms/#dwciri:identificationQualifier) (e.g., cf., aff.)
- Put the species name in [`specificEpithet`](https://dwc.tdwg.org/terms/#dwc:specificEpithet)
- Place the rank of the taxon documented in scientificName (e.g., genus) in [`taxonRank`](https://dwc.tdwg.org/terms/#dwc:taxonRank)
- Document any relevant comments in [`taxonRemarks`](https://dwc.tdwg.org/terms/#dwc:taxonRemarks) or [`identificationRemarks`](https://dwc.tdwg.org/terms/#dwc:identificationRemarks)
- Document any relevant comments in [`taxonRemarks`](https://dwc.tdwg.org/terms/#dwc:taxonRemarks) or [`identificationRemarks`](https://dwc.tdwg.org/terms/#dwc:identificationRemarks) (e.g. reasoning for identification)

Take an example specimen named Pterois cf. volitans. The associated occurrence record would have the following taxonomic information:

- `scientificName` = Pterois
- `identificationQualifier` = cf.
- `specificEpithet` =volitans
- `scientificNameID` =the one for Pterois
- `specificEpithet` = volitans
- `scientificNameID` = the one for Pterois
- `taxonRank` = species

If the provided genus name is unaccepted in WoRMS, it is okay to use the unaccepted name in this field. `scientificNameID` should contain the WoRMS LSID for the genus.
If the provided name is unaccepted in WoRMS, it is okay to use the unaccepted name in this field. `scientificNameID` should contain the [WoRMS LSID](name_matching.html) for the genus.

There is a new Darwin Core term [`verbatimIdentification`](https://dwc.tdwg.org/terms/#dwc:verbatimIdentification) meant for containing the originally documented name, however this term is not yet implemented in OBIS so if you populate this field it will not be indexed alongside your data. However you can use `originalNameUsage` to document original species names.

The use and definitions for additional Open Nomenclature (ON) signs (`identificationQualifier`) can be found in [Open Nomenclature in the biodiversity era](https://doi.org/10.1111/2041-210X.12594), which provides examples for using the main Open Nomenclature qualifiers associated with physical specimens (Figure 1). Whereas the publication [Recommendations for the Standardisation of Open Taxonomic Nomenclature for Image-Based Identifications](https://www.frontiersin.org/articles/10.3389/fmars.2021.620702/full) provides examples and definitions for identificationQualifiers for non-physical specimens (image-based) (Figure 2).
The use and definitions for additional Open Nomenclature (ON) signs (`identificationQualifier`) can be found in [Open Nomenclature in the biodiversity era](https://doi.org/10.1111/2041-210X.12594), which provides examples for using the main Open Nomenclature qualifiers associated with physical specimens (Figure 1). Whereas the publication [Recommendations for the Standardisation of Open Taxonomic Nomenclature for Image-Based Identifications](https://www.frontiersin.org/articles/10.3389/fmars.2021.620702/full) provides examples and definitions for identificationQualifiers for image-based non-physical specimens (Figure 2).

![*Figure 1. Flow diagram with the main Open Nomenclature qualifiers associated with physical specimens. The degree of confidence in the correct identifier increases from the top down. More info and figure copied from [Open Nomenclature in the biodiversity era](https://doi.org/10.1111/2041-210X.12594).*](images/fig1-openNomenclature.png){width=80%}

Expand Down
6 changes: 3 additions & 3 deletions darwin_core.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ DwC terms correspond to the column names of your dataset and can be grouped acco

A list of all possible Darwin Core terms can be found on [TDWG](https://dwc.tdwg.org/terms/). However, OBIS does not parse all terms (note this doesn't mean you cannot include them, they just will not be parsed when you publish to OBIS). Below is an overview of the most relevant Darwin Core terms to consider when contributing to OBIS, with guidelines regarding their use. We have also compiled a convenient [checklist](checklist.html) of OBIS-accepted terms, their DwC class type, and which OBIS file (Event Core, Occurrence, eMoF, etc.) it is likely to be found in.

Note that OBIS currently has eight required DwC terms: `occurrenceID`, `eventDate`, `decimalLongitude`, `decimalLatitude`, `scientificName`, `scientificNameID`, `occurrenceStatus`, `basisOfRecord`.
Note that OBIS currently has seven required and one strongly recommended DwC term: `occurrenceID`, `eventDate`, `decimalLongitude`, `decimalLatitude`, `scientificName`, `occurrenceStatus`, `basisOfRecord`, `scientificNameID` (strongly recommended).

The following DwC terms are related to the Class _Taxon_:

Expand Down Expand Up @@ -122,7 +122,7 @@ The following DwC terms are related to the Class _MaterialSample_:

`scientificName` (required term) should always contain the originally recorded scientific name, even if the name is currently a synonym. This is necessary to be able to track back records to the original dataset. The name should be at the lowest possible taxonomic rank, preferably at species level or lower, but higher ranks, such as genus, family, order, class etc. are also acceptable. We recommend to not include authorship in `scientificName`, and only use `scientificNameAuthorship` for that purpose. The `scientificName` term should only contain the name and not identification qualifications (such as ?, confer or affinity), which should instead be supplied in the `IdentificationQualifier` term, see examples below. `taxonRemarks` can capture comments or notes about the taxon or name.

A [WoRMS](http://www.marinespecies.org/) LSID should be added in `scientificNameID` (required term), OBIS will use this identifier to pull the taxonomic information from the World Register of Marine Species (WoRMS) into OBIS and attach it to your dataset. This information includes:
A [WoRMS](http://www.marinespecies.org/) LSID should be added in `scientificNameID` (strongly recommended term), OBIS will use this identifier to pull the taxonomic information from the World Register of Marine Species (WoRMS) into OBIS and attach it to your dataset. This information includes:

- Taxonomic classification (kingdom through species)
- The accepted name in case of invalid names or synonyms
Expand Down Expand Up @@ -153,7 +153,7 @@ _Data from [Benthic fauna around Franz Josef Land](http://ipt.vliz.be/eurobis/re

If the record represents a nomenclatural type specimen, the term `typeStatus` can be used, e.g. for holotype, syntype, etc.

**In case of uncertain identifications**, and the scientific name contains qualifiers such as _cf._, _?_ or _aff._, then this name should go in `identificationQualifier`, and `scientificName` should contain the name of the lowest possible taxon rank that refers to the most accurate identification. E.g. if the specimen was accurately identified down to genus level, but not species level, then the scientificName should contain the name of the genus, the scientificNameID should contain the LSID the genus and the `identificationQualifier` should contain the uncertain species name combined with _?_ or other qualifiers. The table belowe shows a few examples:
**In case of low confidence identifications**, and the scientific name contains qualifiers such as _cf._, _?_ or _aff._, then this name should go in `identificationQualifier`, and `scientificName` should contain the name of the lowest possible taxon rank that refers to the most accurate identification. E.g. if the specimen was accurately identified down to genus level, but not species level, then the scientificName should contain the name of the genus, the scientificNameID should contain the LSID the genus and the `identificationQualifier` should contain the low confidence species name combined with _?_ or other qualifiers. The table belowe shows a few examples:

The use and definitions for additional NO signs (identificationQualifier) can be found in [Open Nomenclature in the biodiversity era](https://doi.org/10.1111/2041-210X.12594), which provides examples for using the main Open Nomenclature qualifiers associated with _physical specimens_. The publication [Recommendations for the Standardisation of Open Taxonomic Nomenclature for Image-Based Identifications](https://www.frontiersin.org/articles/10.3389/fmars.2021.620702/full) provides examples and definitions for identificationQualifiers for _non-physical specimens (image-based)_.

Expand Down
2 changes: 2 additions & 0 deletions data_qc.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ OBIS ignores records that do not meet a number of standards. For example, all sp
* [QC tool for species names](name_matching.html)
* [QC tool for geography and data format](lifewatch_qc.html)

For specific concerns regarding quality control checks or issues, please submit a GitHub ticket to the [OBIS QC repository](https://github.com/iobis/obis-qc/issues).

## Why are records dropped?

Records can be dropped and therefore not published with your dataset for a number of reasons, including:
Expand Down
2 changes: 1 addition & 1 deletion format_occurrence.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ If your dataset structure is [based on Occurrence core](formatting.html), or has
* `occurrenceStatus`
* `basisOfRecord`
* `scientificName`
* `scientificNameID`
* `scientificNameID` (strongly recommended)
* `eventDate` (not required for Occurrence extension, required for Occurrence Core)
* `decimalLatitude` (not required for Occurrence extension)
* `decimalLongitude` (not required for Occurrence extension)
Expand Down
7 changes: 5 additions & 2 deletions gethelp.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@
## Getting Help in OBIS

If you require additional assistance with OBIS we recommend you first get in touch with the most [relevant OBIS node](https://obis.org/contact/). We also have a **support channel** on [Slack](https://obishq.slack.com/archives/C014PTTKECW) where you can communicate with the OBIS community for help. Please feel comfortable posting to this channel before reaching out to the OBIS Secretariat ([email protected]). The OBIS community is quite active on Slack so you are more likely to receive a quick answer to your question by posting there, as the Secretariat receives many requests.
If you require additional assistance with OBIS we recommend you first get in touch with the most [relevant OBIS node](https://obis.org/contact/). We also have a **support channel** on [Slack](https://obishq.slack.com/archives/C014PTTKECW) where you can communicate with the OBIS community for help. Please feel comfortable posting to this channel before reaching out to the OBIS Secretariat (<[email protected]>). The OBIS community is quite active on Slack and GithHub (see below) so you are more likely to receive a quick answer to your question by posting in either place, as the Secretariat receives many requests.

Finally, you can submit an issue on relevant Github repositories:
You can submit issues and questions on relevant Github repositories:

* [OBIS Manual](https://github.com/iobis/manual/issues)
* [OBIS Website](https://github.com/iobis/web)
* [OBIS issues GitHub repo](https://github.com/iobis/obis-issues)
* [OBIS quality control issues](https://github.com/iobis/obis-qc)
* [All other OBIS repositories](https://github.com/iobis)

We strongly recommend creating a GitHub account to engage with the OBIS community, document issues, ask questions, find datasets that need endorsing, etc. GitHub gives threads a more permanent home and allows for open communication and transparency. If you are unfamiliar with GitHub, the Carpentries have [these training resources](https://swcarpentry.github.io/git-novice/index.html) which you can reference.
8 changes: 6 additions & 2 deletions identifiers.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
## Constructing and using indentifier codes

**Content**

* [eventID](#eventid)
* [occurrenceID](#occurrenceid)
### eventID

Using a unique identifier for each physical sample or subsample in your dataset taken at each location and time is highly recommended to ensure sample traceability and data provenance. `eventID` is an identifier for an individual sampling or observation event, whereas `parentEventID` is an identifier for a parent event, which is composed of one or more sub-sampling (child) events (eventIDs).
Expand Down Expand Up @@ -57,9 +61,9 @@ We can see that each record has a similar eventID structure, except for the last

### occurrenceID

`occurrenceID` is an identifier for occurrence records. Each occurrence record must have a unique identifier. Because `occurrenceID` is a required term, you may have to construct a persistent and globally unique identifier for each of your data records if none already exist.
`occurrenceID` is an identifier for occurrence records. Each occurrence record should have a globally unique identifier. Because `occurrenceID` is a required term, you may have to construct a persistent and globally unique identifier for each of your data records if none already exist.

There are no standardized guidelines yet on designing the persistence of this ID, the level of uniqueness (from within a dataset to globally in OBIS), and the precise algorithm and format for generating the ID. But in the absence of a persistent globally unique identifier, one can be constructed by combining the `institutionCode`, the `collectionCode` and the `catalogNumber` (or autonumber in the absence of a catalogNumber). This is similar to how [eventID](identifiers.html#eventid) is constructed. Note that the inclusion of `occurrenceID` is also necessary for datasets in the [OBIS-ENV-DATA](data_format.html#obis-holds-more-than-just-species-occurrences-the-env-data-approach) format.
There are no standardized guidelines yet on designing the persistence of this ID, the level of uniqueness (from within a dataset to globally in OBIS), and the precise algorithm and format for generating the ID. But in the absence of a persistent globally unique identifier, one can be constructed by combining the `institutionCode`, the `collectionCode` and the `catalogNumber` (or autonumber in the absence of a catalogNumber). This is similar to how [eventID](#eventid) is constructed. You may also follow [Life Science Identifiers](https://www.labkey.org/Documentation/wiki-page.view?name=lsidOverview) guidelines. Note that the inclusion of `occurrenceID` is also necessary for datasets in the [OBIS-ENV-DATA](data_format.html#obis-holds-more-than-just-species-occurrences-the-env-data-approach) format.

An important consideration for museum specimens: there is the possibility that the institution a specimen is housed at may change. Therefore you may consider omitting institution identifiers within an occurrenceID, because occurrenceID should **not** change over time.

Expand Down
2 changes: 1 addition & 1 deletion index.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -58,5 +58,5 @@ It is important that our data providers as well as all the data users are aware

## Acknowledgements

This manual received contributions from: [Leen Vandepitte](https://www.oceanexpert.net/expert/12313), [Mary Kennedy](https://www.oceanexpert.net/expert/13557), [Philip Goldstein](https://www.oceanexpert.net/expert/18051), [Pieter Provoost](https://www.oceanexpert.net/expert/26192), [Samuel Bosch](https://www.oceanexpert.net/expert/26577) and [Ward Appeltans](https://www.oceanexpert.net/expert/11770).
This manual received contributions from: [Leen Vandepitte](https://www.oceanexpert.net/expert/12313), [Mary Kennedy](https://www.oceanexpert.net/expert/13557), [Philip Goldstein](https://www.oceanexpert.net/expert/18051), [Pieter Provoost](https://www.oceanexpert.net/expert/26192), [Samuel Bosch](https://www.oceanexpert.net/expert/26577), [Ward Appeltans](https://www.oceanexpert.net/expert/11770), [Abby Benson](https://orcid.org/0000-0002-4391-107X), [Yi-Ming Yan](https://orcid.org/0000-0001-7087-2646), [Carolina Peralta Brichtova](https://oceanexpert.org/expert/26345), [Saara Suominen](https://oceanexpert.org/expert/43352), [Serita van der Wal](https://oceanexpert.org/expert/49876), and [Elizabeth Lawrence](https://oceanexpert.org/expert/50997).

Loading

0 comments on commit 2008ad9

Please sign in to comment.