File Formats

The following file format definitions are used in data exchange between stages of the CAMI pipeline.

##Genome annotation ###Input

###Output Files available when finished.

####16S_rRNA.fna Contains accepted marker genes. Fasta formated file. Sequence ids are internal ids found in the 'id_mapping.tsv' file.

####16S_rRNA.fna.rejected.fna Contains rejected marker genes. Fasta formated file. Sequence ids are internal ids found in the 'id_mapping.tsv' file.

####id_mapping.tsv Tab separated data table.

####meta_data.tsv Tab separated data table. Column have no fixed order. First row must have column names.

genome_ID: Original genome id
prediction_threshold: A relative genome distance threshold a taxonomic classification was made of.
NCBI_ID: Taxonomic classification. NCBI Taxonomic id
SCIENTIFIC_NAME: Scientific name of taxonomic classification
novelty_category: Novelty category of a genome
OTU: Id of genomes that were clustered together
ANI: Average nucleotide identity to the closest reference genome
ANI_NOVELTY_CATEGORY: Novelty category based on ani
ANI_TAXONOMIC_COMPARE: Taxonomic id of closest reference genome
ANI_SCIENTIFIC_NAME: Scientific name of closest reference genome

####mothur_cluster_16S_rRNA.list Tab separated data table. First row must have column names.

Column 1 'label': Relative genome distance thresholds. Example: unique, 0.01, 0.02, 0.03
Column 2 'numOtus': Number of groups (otu)
Column 3+ 'Otu<index>': Comma separated lists of internal ids

Example:
label numOtus Otu001
unique 518 SR_517,SR_518,SR_462
0.001 469 SR_517,SR_518,SR_462

##Metagenome Simulation ###Input

###Output

Provide feedback