Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using BWA MEM bam files as inputs of tepid-discover #5

Open
songtaogui opened this issue Sep 13, 2018 · 3 comments
Open

Using BWA MEM bam files as inputs of tepid-discover #5

songtaogui opened this issue Sep 13, 2018 · 3 comments

Comments

@songtaogui
Copy link

Hi,

Thank you for developing the wonderful TEPID tool.

I have learned that the tepid-discover using concordant, disconcordant and split-reads bam as inputs. I was wondering if it is possible that I provide these inputs by parsing BWA MEM outputs, rather than generate from tepid-map , because the yaha step is slow. Are there any format differences between bwa MEM split reads and yaha split reads that could affect the running of tepid-discover? Because when I use split-reads bam generated from bwa mem (using samblaster), I got errors like:

Traceback (most recent call last):
  File "/home/maize/gst/sftw/anaconda2/bin/tepid-discover", line 4, in <module>
    __import__('pkg_resources').run_script('TEPID==0.8', 'tepid-discover')
  File "/home/maize/gst/sftw/anaconda2/lib/python2.7/site-packages/pkg_resources/__init__.py", line 748, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/home/maize/gst/sftw/anaconda2/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1524, in run_script
    exec(script_code, namespace, namespace)
  File "/home/maize/gst/sftw/anaconda2/lib/python2.7/site-packages/TEPID-0.8-py2.7.egg/EGG-INFO/scripts/tepid-discover", line 39, in <module>

  File "build/bdist.linux-x86_64/egg/tepid/tepid.py", line 1049, in discover_pe
  File "/home/maize/gst/sftw/anaconda2/lib/python2.7/site-packages/pybedtools/bedtool.py", line 806, in decorated
    result = method(self, *args, **kwargs)
  File "/home/maize/gst/sftw/anaconda2/lib/python2.7/site-packages/pybedtools/bedtool.py", line 337, in wrapped
    decode_output=decode_output,
  File "/home/maize/gst/sftw/anaconda2/lib/python2.7/site-packages/pybedtools/helpers.py", line 356, in call_bedtools
    raise BEDToolsError(subprocess.list2cmdline(cmds), stderr)
pybedtools.helpers.BEDToolsError:
Command was:

        bedtools sort -i /tmp/pybedtools.aFiz3O.tmp

Error message was:
Error: malformed BED entry at line 5. Start was greater than end. Exiting.

Besides, I noticed that tepid-discover decides discordant read pairs by mean insertsize and standard deviation. So I was wondering that does it matter or not if I set the -s average insert size option differently during tepid-map ?

Thank you!

Best wishes,

Songtao Gui

@zyworship
Copy link

Hi, man. You mean, you have try to use the BWA results bam for tepid, it don't work?

@timoast
Copy link
Contributor

timoast commented Jun 6, 2019

Hi, sorry for the delay here. I'm not familiar with the differences in format between BWA-MEM and YAHA, but in principle it should work as long as the bwa mem output is hard-clipped (use the -H option). However I have not tried this myself.

Setting the -s option to different values would not greatly impact the results, as this is just passed to bowtie2 for the alignment. The average insert size used by TEPID for identifying discordant reads is actually calculated empirically from the reads after aligning with bowtie2 (see https://github.com/ListerLab/TEPID/blob/master/tepid/tepid.py#L674)

@zyworship
Copy link

Hi, sorry for the delay here. I'm not familiar with the differences in format between BWA-MEM and YAHA, but in principle it should work as long as the bwa mem output is hard-clipped (use the -H option). However I have not tried this myself.

Setting the -s option to different values would not greatly impact the results, as this is just passed to bowtie2 for the alignment. The average insert size used by TEPID for identifying discordant reads is actually calculated empirically from the reads after aligning with bowtie2 (see https://github.com/ListerLab/TEPID/blob/master/tepid/tepid.py#L674)

Hi Tim,
I have met some problem in running the tepid-discover;
the error looks like this:
[chej5t1@cu09 Sample_YPX10866]$ /GS01/software/biosoft/TEPID/build/scripts-2.7/tepid-discover -k --strict -p 6 -n YPX10866 -c YPX10866.bam -s YPX10866.split.bam -t scaffold_version_TE.bed
Processing YPX10866
Running paired-end mode
Estimating mean insert size and coverage
Traceback (most recent call last):
File "/GS01/software/biosoft/TEPID/build/scripts-2.7/tepid-discover", line 44, in <module> tepid.discover_pe(options)
File "/GS01/software/biosoft/python/python2.7/lib/python2.7/site-packages/TEPID-0.8+11.g475e1dd.dirty-py2.7.egg/tepid/tepid.py", line 1009, in discover_pe cov = calc_cov(options.conc, 100000, 120000)
File "/GS01/software/biosoft/python/python2.7/lib/python2.7/site-packages/TEPID-0.8+11.g475e1dd.dirty-py2.7.egg/tepid/tepid.py", line 721, in calc_cov for read in bam.pileup(nms[0], start, stop):
IndexError: list index out of range

could you help me with this problem?
Zhang Yi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants