-
Notifications
You must be signed in to change notification settings - Fork 37
File Formats
The following file format definitions are used in data exchange between stages of the CAMI pipeline.
This file contains the options and values required for the pipeline to run and a path to this file is a required program argument. The following arguments from this file expect a file path.
-
id_to_genome_file
Tab separated data table. It maps genome ids with the file path to genomes.- Column 1: Genome id
- Column 2: file path
-
id_to_gff_file
Tab separated data table. It maps genome ids with the file path to the gene annotation of a genome.- Column 1: Genome id
- Column 2: file path
'{i}' is the index for each sample that is to be generated.
- Column 1: genome_ID
- Column 2: abundance
'genome_ID' is the identifier of the genomes used.
'abundance' is the relative abundance of a genome to be simulated. 'abundance' does not reflect the amount of genetic data of a genome, but the amount of genomes.
In a set of two genomes, with both having a abundance of 0.5 but one genome is double the size of the other, the bigger genome will be 66% of the genetic data in the simulated metagenome.
All given genomes will be copied and placed in this folder. Doing this, sequence names are made sure to be unique and renamed if required. Comments and descriptions of sequences are removed.
This file contains a list of replaced sequence ids.
- Column 1: genome_ID
- Column 2: original sequence id
- Column 3: new sequence id
List of genomes paths to the copies in the output directory in the 'source_genomes' folder.
- Column 1: genome_ID
- Column 2: file path
Merged meta data of genomes of each community that are actually used for the simulation.
Unused meta data of genomes of every community.
bam files generated based on reads generated from the read simulator
If no anonymization is not done in which case the original fastq files will be here.
If anonymization is done, this will be the only fastq file.
Mapping of reads for evaluation
- Column 1: anonymous read id
- Column 2: genome id
- Column 3: taxonomic id
- Column 4: read id
Fasta file with perfect assembly of reads of this sample
Mapping of contigs for evaluation
- Column 1: anonymous contig id
- Column 2: genome id
- Column 3: taxonomic id
- Column 4: sequence id of the original genome (in 'source_genomes' folder)
- Column 5: number of reads used in the contig
- Column 6: start position
- Column 7: end position
Fasta file with perfect assembly of reads from all samples
Mapping of contigs from pooled reads for evaluation.
- Column 1: anonymous_contig_id
- Column 2: genome id
- Column 3: taxonomic id
- Column 4: sequence id of the original genome (in 'source_genomes' folder)
- Column 5: number of reads used in the contig
- Column 6: start position
- Column 7: end position
Taxonomic profile for each sample