-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thread error running bamgineer #10
Comments
Hey, |
Please find the config.cfg pasted below. Am working on a cloud instance. Not using docker currently for this tool as it gave separate issues earlier (hard to describe all here). Running bamgineer locally on the cloud instance (have installed all dependencies locally). [SOFTWARE] [REFERENCE] [RESULTS] |
What version of the multiprocessing package gave you the error? The version of multiprocessing we have on our cluster is 0.70a1. From the documentations it looks like the latest version (0.70.7) is a fork of 0.70a1 (https://pypi.org/project/multiprocess/0.70.7) |
Hey, I've updated multiprocessing to now use multiprocess 0.70.7 (pip install multiprocess==0.70.7). Please pull from the latest version of bamgineer and let me know if you have any issues with it. |
Sure. Would let you know. Thank you! |
Hi suluxan, The program is running fine now, but has been running for almost 3 hours with a small bam (input) containing only chr21 and chr22 regions, a 'splitbam' directory containing chr21.bam , chr21.byname.bam, chr22.bam and chr22.byname.bam, AND a cnv file containing only 1 amp (cn=4) for 1 region of chr21. The script goes upto the step of creating a chr21_roiamp4AABB47974300.bam file under "tmpbams" but seems to be taking a good amount of time for creating the final simulated bam. Could you help to see if something is going wrong here. Please see the command-line, bed and logs pasted below. The config file is same as posted in this thread earlier. (Note: I do give a phased vcf consisting only chr21 phased variants to bamgineer. The phased vcf was created by running the beagle tool ahead of running bamgineer {due to some issues we faced earlier while running beagle as a part of bamgineer workflow earlier; not necessary to discuss at the moment} ) Command line: cnv bed file: Logs: b)debug.log Do you expect simulate.py to take this much time with such a small bam? If "yes", then, does a multithread parameter exist which could make simulate.py run faster on a single instance? I did not see such parameter in the "help" section. |
Yeah, the previous steps of Bamgineer v1 for phasing were not that clear; it seems Beagle needs population data to phase correctly. Is that how you generated your VCF? I have been running Bamgineer v2 with properly phased VCFs (from 10x) and was working on a change to make it much faster (to only use "PASS" variants) but I am working on the benchmarking. I will push that change now and you can let me know if it helps. It should not take that long to get the ROI bam especially considering how small the cnv is. Although bamgineer v2 is capable of such focal alterations, I would recommend a couple Kb in order to get a decent amount of reads in the ROI bam. |
Regarding the multithreading comment, once we update the pysam/samtools versions we will be able to take advantage of the multithreading. The current samtools version that we use (1.2) does not support multithreading. Also, try pulling from the latest version now and let me know if you have the same problem. |
suluxan, I am getting ROI bam (in tmpbam folder) no problem, but not getting the final bam in the finalbam folder. I believe that the finalbam folder would contain the bam simulated with the CNVs...am I right? |
What are the other files in the tmpbams directory? |
Just one: chr21_roiamp4AABB47974300.bam |
That tmp bam has around 282 reads |
Try the latest version I just pushed, the ROI should generate much faster. |
Suluxan, ROI is indeed getting generated faster. The problem is that the python script is still running. And I see no final bam generated in finalbam folder. I believe that final bam should be the one that actually contains the simulated cnv... Am I right? |
Let me try the new version anyways.. |
Suluxan, the new version generates the ROI bam at the same speed as the previous version, but gives back the multiprocessing module error which you already fixed last Friday. And moreover, the issue still remains: The script is still running and the final bam not being generated. See the throwback for the multiprocessing error below: |
One more point to add is: The samtools version that bamgineer using on my instance is 0.1.18. This was kept consistent with what you mentioned in the example config file. Do you think updating that to 1.2 might make a difference speedwise? (Given that you already mentioned that 1.2 is slow). Regardless, I think I should be consistent with 1.2 version just to compare apples to apples...let me do that (not with the new version of bamgineer but the old version {because of the mutliprocessing issue that I just mentioned a min ago}.... |
Ok, so rerunning the bamgineer with samtools 1.2 version. The ROI bam got generated in the "tmpbams" directory in a second. The script is still running. I would wait to see if it generates the final bam by the end of the day. |
Okay, couple of things:
I presume the tool stops running due to the starred points above. |
A lot of the dependency issues were supposed to be solved through the Dockerfile... any reasons why it initially failed? Considering you are on a cloud environment it would be the optimal route to go. Also, I will get a bamgineer image to the dockerhub by tonight or tomorrow so it will be easy to just pull from there. |
May I know which version of bamUtil do you prefer? |
Please check the dockerfile (bamgineer/docker-example/Dockerfile) for install instructions and versions. We have tested bamgineer with bamUtil/1.0.14. I am working on getting the image to a docker repo as well as updating the documentation and I will let you know when those are available. Thanks. |
Ok, I was looking into config.cfg under bamgineer/docker-example/inputs folder for versioning info. Thanks for correcting me. I am indeed using bamUtil/1.0.14. Glad to know you recommend the same. At this point, I have all versions of all tools setup appropriately on my cloud instance. I would also test the docker container once you have it up on docker repo. We use singularity engine on our instance. So, would need to convert your docker container to singularity. That is what I did earlier too, but the issue I faced with your docker container seemed to be less related to its compability with singularity and more related to the internal (default) environment in the container itself. Singularity does not make significant changes to the default environment in the docker containers (based on my experience converting docker containers to singularity ones and using them with singularity engine). They usually run well through singularity engine. |
FYI: I was using a docker container from this account earlier : https://hub.docker.com/r/virenar/bamgineer. Doesnt look like your account.. |
Suluxan, does bamgineer delete the tmp bams in tmpbam dir after the execution completes? I see that the execution completed (no python script running under "top" output), but there is no finalbam generated. Also , does the bedtool.log gets deleted as well? /mnt/DataDisk/Bamgineer Please note that am using -splitBamDir option with the following files in my splitBam dir: |
Which version of pathos and pandas you recommend? Not clear from the DockerFile. |
That docker container was not from us. It is from a user. I am updating the documentation, pathos is no longer necessary since we have updated multiprocessing to multiprocess. For pandas, I am on 0.20.2 but it should not matter. The image should solve all dependency issues. The "-cancertype" is not necessary, it just organizes the output bam directories into a cancer type directory. |
Thanks so much suluxan. I would try out the container..:) |
Couldnt find your docker image on dockerhub. Sorry I thought you had already uploaded. Any estimated ETA that you could give would be great. |
Ah sorry I have been working on other things. At the latest I will have it up for you by tomorrow. Will let you know as soon as I do; thanks! |
Thanks! |
Hi, suluxan, could you paste the command line from your most recent bamgineer run? |
The docker image is available at suluxan/bamgineer. You can use singularity to build it with The tools in the configfile in bamgineer/docker-example/inputs are linked to the image itself so they require no changes. Just mount or move your files into the container and point to them in the config file and the python script and run! |
Sure, thanks suluxan! |
It worked suluxan! Thank you |
Hi suluxan, Does the exon bed needs to be 0-based or 1-based? |
1-based but it is a whole genome start and end coordinate i.e. chr21 1 48129895 for hg19. The exons.bed name convention was kept from the previous version. |
awesome..thanks! |
Hi suluxan, It seems bamgineer requires "chr" text for chromosome names in bams, beds, etc.. For example: if chromosomes are named just "1","2","3",etc. , bamgineer would not move forward. Could you fix that for us so that we do not have to worry about converting our bams to match with "chr" naming convention. We use gatk-broad/ncbi reference genome in our pipeline as opposed to ucsc ones, so do not have "chr" text prefixed to our chromosome names. And converting the bams later to match with those chromosome names is a pain in neck. |
Hey, |
Thanks. Please let me know.. |
Getting error running the bamgineer tool. Seems to be with respect to the multiprocessing module. I also tried to use the older version of multiprocessing module ( (0.70.4, as suggested on online forums for such a python error; seems to be a common error). Still no luck in getting bamgineer to work through it. Could you suggest a solution to it?
Please find the error log below:
___ generating phased bed ___
___ filtering bed file columns for amp4AABB47974300_tmp2.bed ___
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/mnt/DataDisk/NGS_tools/bamgineer/src/helpers/handlers.py", line 76, in receive
record = self.queue.get(True, self.polltime)
File "/usr/lib/python2.7/multiprocessing/queues.py", line 135, in get
res = self._recv()
TypeError: init() takes exactly 2 arguments (1 given)
The text was updated successfully, but these errors were encountered: