-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A different parallelisation granularity in GATB #37
Comments
I think this issue is likely related to only one thread being able to use the HDF5 library at a time. According to https://support.hdfgroup.org/HDF5/faq/threadsafe.html HDF5 can be built to be threadsafe, but a couple of flags need to be turned on. @rchikhi do you know if this option is turned on for GATB? |
Hi @leoisl, @mbhall88, sorry for dropping the ball on this.
Rayan |
Dear GATB team,
In pandora, we use GATB to make a local assembly of de-novo variants. We are facing a performance issue where we have to perform several thousands local assemblies using GATB. Removing some outliers that we are currently dealing with, usually these local assemblies are made on small graphs, and we can process each one of them very fast (usually in less than 1 second, in some cases more, but 5 seconds is the upper limit removing some outliers). This is all using 1 thread (giving
-nb-cores 1
when building the graph). Our performance problem arises from the fact that we have to perform this small assembly several thousand times, adding up to a considerable runtime.A natural way to speed this up is multithreading the processing of all these local assemblies, but we are facing issues with GATB in this case. I think it is reasonable that GATB was not designed to have several graphs built by different threads simultaneously in memory, as the general use case I guess is a huge graph built from NGS reads, instead of thousands of small graphs as is our particular use case. Anyway, a minimum working example where we can reproduce the issues we are having can be found here, where we simply start 8 threads, and each one tries to build the same graph, but we get several types of runtime errors. If we comment the
#pragma omp
line, everything runs single-threadedly and well.I think we already identified and solved one issue with this type of multithreading. Running
strace -y -t -e trace=open,close
, we could identify that GATB created several temporary (trashme*
) files using the process name (seegatb-core/gatb-core/src/gatb/system/impl/FileSystemCommon.cpp
Line 185 in 7cb8a48
strace
shows that temporary files are now created with different names.However, even with these changes, we still get several errors. We looked at these errors by running our multithreading example in debug mode and looking at the stack frames when a segmentation fault happened. In several cases, the errors were within the HDF5 library, which I think was not compiled with the
threadsafe
parameter turned on, but this answer on SO seems to show that is hard to have multiple threads accessing HDF5 library in C++. Other errors were memory corruption errors (e.g. double frees, destructors called twice, etc), but it seems that all these errors are related with some GATB singleton objects, likegatb-core/gatb-core/src/gatb/tools/misc/impl/HostInfo.hpp
Line 54 in 7cb8a48
which makes sense.
This short investigation made us realise that enabling this type of parallelisation in GATB might be more complicated than we thought, and might need a considerable time investment, which we can't afford to do it. We would like to consult you whether you think this is indeed a complicated issue to solve that needs considerable development, or if it is only a few changes here and there.
Thanks for reaching until here! I owe you a cookie when we meet again :p !
The text was updated successfully, but these errors were encountered: