Skip to content

Commit

Permalink
clarify associatedSequences guidelines
Browse files Browse the repository at this point in the history
* added clarifying language for associatedSequences as per discussion gbif/doc-publishing-dna-derived-data#199
  • Loading branch information
EliLawrence committed Jul 12, 2024
1 parent cb0bfc6 commit 48f7411
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion dna_data.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ In addition to the [usual required terms for Occurrence datasets](format_occurre

For `organismQuantity` and `sampleSizeValue` in eDNA datasets, the quantities recorded with sequencing studies always represent **relative abundance to the total reads in the sample**, and cannot be directly compared across samples. This is due to the nature of the sample processing protocol and the amplification of DNA with PCR, which biases the original quantities. In `organismQuantity`, record the **amount of a unique sequence in a specific sample** (i.e. 33 reads). In `sampleSizeValue`, record the **total number of all reads** in that specific sample (i.e. 15310 reads). This information will allow people accessing the data to calculate the relative abundance of that sequence in the sample. The fields `organismQuantityType`, and `sampleSizeUnit`, should be populated with “DNA sequence reads”, as it is of high importance that sequence abundances are not confused with organism abundances recorded by traditional methods. The abundance information can usually be found in the “OTU-table”.

`associatedSequences` should contain a link to the “raw” sequences deposited in a public database or list of identifiers for the genetic sequence associated with the occurrence record (e.g. GenBank). The actual sequence of the occurrence will be documented in the DNA Derived Data extension.
`associatedSequences` should contain a reference to the URL domain where genetic sequence information associated with the Occurrence can be found, e.g. a link, identifier, or list (concatenated and separated) of identifiers. Can link to archived raw barcode reads and/or associated genome sequences, like a public repository. It is recommended that links contain the domain name (e.g. NCBI) in the URL, for example: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA887898/. The actual sequence of the occurrence will be documented in the DNA Derived Data extension.

`identificationRemarks` should be used to record information on how the taxonomic information of the occurrence was reached against which reference database, and, if possible, with which confidence. For example “RDP annotation confidence: 0.96, against reference database: GTDB”. This information should be recorded in the bioinformatic protocol of the study. Note: this information will also be recorded in the DNA derived extension in the fields `otu_seq_comp_appr` and `otu_db`.

Expand Down

0 comments on commit 48f7411

Please sign in to comment.