Skip to content
Peter Hofmann edited this page Nov 9, 2015 · 21 revisions

The following file format definitions are used in data exchange between stages of the CAMI pipeline.

##Genome annotation ###Input

###Output Files available when finished.

####16S_rRNA.fna Contains accepted marker genes. Fasta formated file. Sequence ids are internal ids found in the 'id_mapping.tsv' file.

####16S_rRNA.fna.rejected.fna Contains rejected marker genes. Fasta formated file. Sequence ids are internal ids found in the 'id_mapping.tsv' file.

####id_mapping.tsv Tab separated data table.

  • Column 1: Internal id
  • Column 2: Original id
  • Column 3: NCBI taxnonomic ID (Silva), if known
  • Column 4: NCBI taxnonomic ID (EMBL), if known

####meta_data.tsv Tab separated data table. Column have no fixed order. First row must have column names.

  • genome_ID: Original genome id
  • prediction_threshold: A relative genome distance threshold a taxonomic classification was made of.
  • NCBI_ID: Taxonomic classification. NCBI Taxonomic id
  • SCIENTIFIC_NAME: Scientific name of taxonomic classification
  • novelty_category: Novelty category of a genome
  • OTU: Id of genomes that were clustered together
  • ANI: Average nucleotide identity to the closest reference genome
  • ANI_NOVELTY_CATEGORY: Novelty category based on ani
  • ANI_TAXONOMIC_COMPARE: Taxonomic id of closest reference genome
  • ANI_SCIENTIFIC_NAME: Scientific name of closest reference genome

####mothur_cluster_16S_rRNA.list Tab separated data table. First row must have column names.

  • Column 1 'label': Relative genome distance thresholds. Example: unique, 0.01, 0.02, 0.03
  • Column 2 'numOtus': Number of groups (otu)
  • Column 3+ 'Otu<index>': Comma separated lists of internal ids

Example:
label numOtus Otu001
unique 518 SR_517,SR_518,SR_462
0.001 469 SR_517,SR_518,SR_462

##Metagenome Simulation ###Input

###Output

Clone this wiki locally