Skip to content

Profile one sample

Lilith Feer edited this page May 6, 2024 · 16 revisions

The mOTUs profiler takes as input one or multiple fastq files. The input can be zipped in .gz or .bz2 format. The output is a file with the relative abundance (or read counts) of the prokaryotic species found in the sample.

To profile samples we will use the command profile, type motus profile for the list of parameters.

Content of this page:

Specify fastq input files

In the fastq format, it is possible to provide paired end reads:

motus profile -f forward_reads.fastq -r reverse_reads.fastq

as well as single reads that come from quality filtering:

motus profile -f forward_reads.fq -r reverse_reads.fq -s single_reads.fq

or only single reads if the sequencing was not performed with paired end reads:

motus profile -s single_reads.fq

Moreover, if the sample was divided in two (or more) lanes, is possible to profile all reads together:

motus profile -f for_r_lane1.fq,for_r_lane2.fq -r rev_r_lane1.fq,rev_r_lane2.fq

Specify the name of the sample

You can specify the name of the sample with the -n option. Example:

motus profile -s ERS199259_filtered.fq.gz -n ERS199259

In the resulting profile, the third header will be:

#consensus_taxonomy	ERS199259

If you do not specify the name, then the third header will be:

#consensus_taxonomy	unnamed sample

Note that the name that is specified with -n will be used in the merge function of motus.

If you do not specify the names, and merge samples, then it will not be possible to understand which column corresponds to which sample. Example:

motus profile -s ERS199259_filtered.fq.gz -o ERS199259.motus
motus profile -s SRS190579_filtered.fq.gz -o SRS190579.motus

motus merge -i ERS199259.motus,SRS190579.motus

  #consensus_taxonomy                       unnamed sample    unnamed sample
  Kandleria vitulina [ref_mOTU_v2_0001]                 93                42
  Methyloversatilis universalis [ref_mOTU_v2_0002]       0                 4

Number of threads

You can specify the number of threads that you want to use with the -t option. Example:

motus profile -s ERS199259_filtered.fq.gz -t 4

We suggest to use 4 or 8 threads, note that the time needed for profiling is not scaling linearly with the number of threads.

Use of stderr and stdout

As default, the profile is printed to the standard output, and the messages are printed to the standard error. Hence, you can separate messages and result with the following code:

motus profile -s test1.fastq > ~/Desktop/result 2> ~/Desktop/messages

Where ~/Desktop/result contains:

head ~/Desktop/result 
 # git tag version 2.1.0 |  motus version 2.1.0 | map_tax 2.1.0 | gene database: nr2.1.0 | calc_mgc 2.1.0 -y insert.scaled_counts -l 75 | calc_motu 2.1.0 -k mOTU -C no_CAMI -g 3 | taxonomy: ref_mOTU_2.1.0 meta_mOTU_2.1.0
 # call: python mOTUs_v2/motus profile -s test1.fastq
 #consensus_taxonomy	unnamed sample
 Kandleria vitulina [ref_mOTU_v2_0001]	0.0688211617
 Methyloversatilis universalis [ref_mOTU_v2_0002]	0.0000000000
 Megasphaera genomosp. [ref_mOTU_v2_0003]	0.0234955832

And ~/Desktop/messages contains:

head ~/Desktop/messages 
 [main]  MAP_TAX -----------
 [main] Number of detected lanes: 1
 [main] Run bwa on lane 1
  [map_db](map single reads) 8.68 sec
 [main] Total time map_tax: 8.69 sec
 [main]  CALC_MGC -----------

Save profile to file

It is possible to save the profile with the -o option. Example:

motus profile -s test1.fastq -o ~/Desktop/result.motus

which is equivalent to:

motus profile -s test1.fastq > ~/Desktop/result.motus

Save intermediate files

The reads from fastq files are mapped through bwa (Li H. and Durbin R. (2009) Bioinformatics) to the mOTUs database and filtered for the length of the alignment (-l option) and the percentage identity of the alignment (97%). You can save the filtered reads with -I to a bam file:

motus profile -s test1.fastq -I test1.bam

And profile providing this bam file as input:

motus profile -i test1.bam

Check also issue#48 for some examples on how to check the produced bam file (useful also to analyze what happen to each read from the fastq file).

If you want to run mOTUs with different parameters, you can save the intermediate bam file and provide it as input afterwards. Example:

motus profile -s test1.fq -I test1.bam -l 40 -o result_l40.motus
motus profile -i test1.bam -l 70 -o result_l70.motus
motus profile -i test1.bam -l 100 -o result_l100.motus

Here, we ran mOTUs with three different read length filters (40, 70 and 100). The first line takes 40 mins, while the last two take 5 mins each (this is due to the fact that running bwa takes 80% of the total computation time). Note that the filtered reads saved in the first line are filtered for -l 40, hence it's possible to then filter for -l 70. It is not possible to do the contrary: if you filter out the reads shorter than 70, then you miss the reads with a mapping length in between 40 and 70.

In a similar manner, it is possible to save the marker gene cluster (MGC) counts with -M and load them with -m.