Using BWA MEM bam files as inputs of tepid-discover #5

songtaogui · 2018-09-13T05:44:25Z

Hi,

Thank you for developing the wonderful TEPID tool.

I have learned that the tepid-discover using concordant, disconcordant and split-reads bam as inputs. I was wondering if it is possible that I provide these inputs by parsing BWA MEM outputs, rather than generate from tepid-map , because the yaha step is slow. Are there any format differences between bwa MEM split reads and yaha split reads that could affect the running of tepid-discover? Because when I use split-reads bam generated from bwa mem (using samblaster), I got errors like:

Traceback (most recent call last):
  File "/home/maize/gst/sftw/anaconda2/bin/tepid-discover", line 4, in <module>
    __import__('pkg_resources').run_script('TEPID==0.8', 'tepid-discover')
  File "/home/maize/gst/sftw/anaconda2/lib/python2.7/site-packages/pkg_resources/__init__.py", line 748, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/home/maize/gst/sftw/anaconda2/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1524, in run_script
    exec(script_code, namespace, namespace)
  File "/home/maize/gst/sftw/anaconda2/lib/python2.7/site-packages/TEPID-0.8-py2.7.egg/EGG-INFO/scripts/tepid-discover", line 39, in <module>

  File "build/bdist.linux-x86_64/egg/tepid/tepid.py", line 1049, in discover_pe
  File "/home/maize/gst/sftw/anaconda2/lib/python2.7/site-packages/pybedtools/bedtool.py", line 806, in decorated
    result = method(self, *args, **kwargs)
  File "/home/maize/gst/sftw/anaconda2/lib/python2.7/site-packages/pybedtools/bedtool.py", line 337, in wrapped
    decode_output=decode_output,
  File "/home/maize/gst/sftw/anaconda2/lib/python2.7/site-packages/pybedtools/helpers.py", line 356, in call_bedtools
    raise BEDToolsError(subprocess.list2cmdline(cmds), stderr)
pybedtools.helpers.BEDToolsError:
Command was:

        bedtools sort -i /tmp/pybedtools.aFiz3O.tmp

Error message was:
Error: malformed BED entry at line 5. Start was greater than end. Exiting.

Besides, I noticed that tepid-discover decides discordant read pairs by mean insertsize and standard deviation. So I was wondering that does it matter or not if I set the -s average insert size option differently during tepid-map ?

Thank you!

Best wishes,

Songtao Gui

The text was updated successfully, but these errors were encountered:

zyworship · 2019-04-16T11:16:17Z

Hi, man. You mean, you have try to use the BWA results bam for tepid, it don't work?

timoast · 2019-06-06T02:08:24Z

Hi, sorry for the delay here. I'm not familiar with the differences in format between BWA-MEM and YAHA, but in principle it should work as long as the bwa mem output is hard-clipped (use the -H option). However I have not tried this myself.

Setting the -s option to different values would not greatly impact the results, as this is just passed to bowtie2 for the alignment. The average insert size used by TEPID for identifying discordant reads is actually calculated empirically from the reads after aligning with bowtie2 (see https://github.com/ListerLab/TEPID/blob/master/tepid/tepid.py#L674)

zyworship · 2019-06-20T07:47:43Z

Hi, sorry for the delay here. I'm not familiar with the differences in format between BWA-MEM and YAHA, but in principle it should work as long as the bwa mem output is hard-clipped (use the -H option). However I have not tried this myself.

Setting the -s option to different values would not greatly impact the results, as this is just passed to bowtie2 for the alignment. The average insert size used by TEPID for identifying discordant reads is actually calculated empirically from the reads after aligning with bowtie2 (see https://github.com/ListerLab/TEPID/blob/master/tepid/tepid.py#L674)

Hi Tim,
I have met some problem in running the tepid-discover;
the error looks like this:
[chej5t1@cu09 Sample_YPX10866]$ /GS01/software/biosoft/TEPID/build/scripts-2.7/tepid-discover -k --strict -p 6 -n YPX10866 -c YPX10866.bam -s YPX10866.split.bam -t scaffold_version_TE.bed
Processing YPX10866
Running paired-end mode
Estimating mean insert size and coverage
Traceback (most recent call last):
File "/GS01/software/biosoft/TEPID/build/scripts-2.7/tepid-discover", line 44, in <module> tepid.discover_pe(options)
File "/GS01/software/biosoft/python/python2.7/lib/python2.7/site-packages/TEPID-0.8+11.g475e1dd.dirty-py2.7.egg/tepid/tepid.py", line 1009, in discover_pe cov = calc_cov(options.conc, 100000, 120000)
File "/GS01/software/biosoft/python/python2.7/lib/python2.7/site-packages/TEPID-0.8+11.g475e1dd.dirty-py2.7.egg/tepid/tepid.py", line 721, in calc_cov for read in bam.pileup(nms[0], start, stop):
IndexError: list index out of range

could you help me with this problem?
Zhang Yi

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using BWA MEM bam files as inputs of tepid-discover #5

Using BWA MEM bam files as inputs of tepid-discover #5

songtaogui commented Sep 13, 2018

zyworship commented Apr 16, 2019

timoast commented Jun 6, 2019

zyworship commented Jun 20, 2019

Using BWA MEM bam files as inputs of tepid-discover #5

Using BWA MEM bam files as inputs of tepid-discover #5

Comments

songtaogui commented Sep 13, 2018

zyworship commented Apr 16, 2019

timoast commented Jun 6, 2019

zyworship commented Jun 20, 2019