-
Notifications
You must be signed in to change notification settings - Fork 37
File Formats
The following file format definitions are used in data exchange between stages of the CAMI pipeline.
##Genome annotation ###Input
###Output Files available when finished.
####16S_rRNA.fna Contains accepted marker genes. Fasta formated file. Sequence ids are internal ids found in the 'id_mapping.tsv' file.
####16S_rRNA.fna.rejected.fna Contains rejected marker genes. Fasta formated file. Sequence ids are internal ids found in the 'id_mapping.tsv' file.
####id_mapping.tsv Tab separated data table.
- Column 1: Internal id
- Column 2: Original id
- Column 3: NCBI taxnonomic ID (Silva), if known
- Column 4: NCBI taxnonomic ID (EMBL), if known
####meta_data.tsv Tab separated data table. Column have no fixed order. First row must have column names.
- genome_ID: Original genome id
- prediction_threshold: A relative genome distance threshold a taxonomic classification was made of.
- NCBI_ID: Taxonomic classification. NCBI Taxonomic id
- SCIENTIFIC_NAME: Scientific name of taxonomic classification
- novelty_category: Novelty category of a genome
- OTU: Id of genomes that were clustered together
- ANI: Average nucleotide identity to the closest reference genome
- ANI_NOVELTY_CATEGORY: Novelty category based on ani
- ANI_TAXONOMIC_COMPARE: Taxonomic id of closest reference genome
- ANI_SCIENTIFIC_NAME: Scientific name of closest reference genome
####mothur_cluster_16S_rRNA.list Tab separated data table. First row must have column names.
- Column 1 'label': Relative genome distance thresholds. Example: unique, 0.01, 0.02, 0.03
- Column 2 'numOtus': Number of groups (otu)
- Column 3+ 'Otu<index>': Comma separated lists of internal ids
Example:
label numOtus Otu001
unique 518 SR_517,SR_518,SR_462
0.001 469 SR_517,SR_518,SR_462
##Metagenome Simulation ###Input
###Output