Skip to content
Alessio Milanese edited this page Jun 3, 2021 · 11 revisions

1. Why just few reads map in my profiles?

If you profile your samples and have zero reads mapping or very few reads with the -c option. One possibility is that all the reads are filtered out. The mOTUs profiler is filtering out all the reads that map with less than 75 nucleotides (-l 75). For example, with old samples, the fastq reads were on average of length 50, and in this case they would all be filtered out. Try to use -l 45 to keep more reads during the filtering process.
Note that the average read length of the reads in your sample is printed by the tool:

[main] Minimum alignment length: 75 (average read length: 50)

You can also add -g 1 to keep more reads (see Increase precision or recall page for more information).
Another possibility is that you are profiling samples from a biome that is not covered by reference genomes and is also not covered by mOTUs. The biomes that we can currently profile with meta-mOTUs (unknown species) are oceans, human gut, human oral cavity, human vagina and human skin. If you have soil samples, mOTUs will be able to profile only the reference genomes, which will cover a small portion of all the species.

2. What is the meaning of the unassigned fraction?

The unassigned at the end of the profile file represents the fraction of unmapped reads. This represents species that we know to be present in the sample, but we are not able to quantify individually; hence we group them together into an unassigned fraction. For almost all the analysis, it is better to remove this value, since it does not represent a single species/clade. The usefulness of the unassigned is shown when we need to calculate relative abundances. See the following example:

 True rel. ab.      mOTUs read counts      mOTUs rel. ab.
species1   20%        species1    200     species1    20%
species2   10%        species3    300     species3    30%
species3   30%        species4    100     species4    10%
species4   10%        unassigned  400     unassigned  40%
species5   30%

In the example, the sample (True rel. ab.) contains 5 species, of which only 3 are represented in the mOTUs profiler. Despite this, the relative abundance of these species is correct since we are able to measure the unassigned (or unmapped reads). If you would calculate the relative abundance without taking into account the unassigned, then you would get an over-estimation of the profiled species:

 True rel. ab.     mOTUs read counts       mOTUs rel. ab.
species1   20%        species1   200     species1   33.4%
species2   10%        species3   300     species3     50%
species3   30%        species4   100     species4   16.6%
species4   10%
species5   30%

For your analysis (for example comparing healthy controls to disease samples), you will use species1:20%; species2:30%; species3:10% and remove the unassigned (but, after calculating the relative abundance).

3. How many mOTUs read counts should I expect to have?

The number of mOTUs count (run with the -c option) is proportional to the library size (the number of reads in the fastq files). There is a Pearson correlation of 0.88. Here is a plot of what to expect when using human fecal samples:

Note that we counted all reads here, hence paired end reads (one in the forward fastq file and one in the reverse fastq file) are count as two reads.

Number of mOTUs count expected:

Total number of reads (million) Median mOTUs count
5 600
8 900
15 1,900
25 3,300
35 5,500
50 8,800
100 13,000

4. Where can I find the taxonomy annotation for each mOTUs?

You can download the mOTUs database from this link. And select the correct version (the link is for version 2.6.1). When you unzip the file, you can find the following files:

  • db_mOTU_taxonomy_meta-mOTUs.tsv, the taxonomy for the meta-mOTUs, with a total of 8 columns: first column is the mOTUs ID and then the 7 levels (kingdom to species);
  • db_mOTU_taxonomy_ref-mOTUs.tsv, the taxonomy of the ref-mOTUs, with a total of 9 columns: first the specI ID (as in http://progenomes.embl.de/), second the ref-mOTUs ID and then the 7 levels (kingdom to species);
  • db_mOTU_taxonomy_ref-mOTUs_short_names.tsv, this is a file with 3 columns: ref-mOTUs ID, short name and full name. The full name corresponds to the last column of db_mOTU_taxonomy_ref-mOTUs.tsv. Since many of these names are really long, we created a shorter version (second column) which is the one printed by the profiler by default. If you use the -u command in mOTUs, you will print the full name for the species.

Note that if you have already installed mOTUs, the database and the files are already present in your system.