You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Discussions and links for investigating the possibility of moving to use of another library for the de Bruijn graph (DBG) implementation used by the de novo discovery routine.
Previous discussion relating to the initial choice and integration of GATB for this role can be found at #16.
Reasons for initiating this discussion:
Integration of GATB caused some compatibility issues with boost. Part of this problem seems to have been that GATB expects a system-wide boost. In addition, there are boost files actually inside the GATB repository. Both of these issues have combined to mean that rather than building the boost dependencies with pandora we require them to be system-wide.
GATB takes a very long time to compile - significantly longer than the rest of pandora.
There does not seem to be a wide range of Clang compiler support for GATB. This will affect Mac users.
A solution that has been proposed if moving to the use of McCortex. One added benefit here is @iqbal-lab is very familiar with this code. Pros/Cons of McCortex vs GATB is probably a good place to start this discussion.
The text was updated successfully, but these errors were encountered:
mbhall88
changed the title
Investigating change of library used for de Bruijn graph used by de novo discovery
Investigating change of de Bruijn graph library used for de novo discovery
Jan 15, 2020
UPDATE: I had a go at switching from GATB to bifrost. It was super easy to integrate, but I realised later, when I was changing the interface class, that bifrost does not store coverage information in the graph. I had a chat to Guillaume and he said it is possible to add the info manually, but it requires a fairly convoluted process.
You add a string to the graph (bifrost handles breaking it into k-mers). Then, you need to look up each k-mer from the string you added (yourself) in the graph and increment a custom data attribute integer for coverage. This is then repeated for each sequence you add. So in total, we end up doing a tonne of work for each sequence we add.
Seems like cortex is probably going to be the best bet. Yay C 🎉
Discussions and links for investigating the possibility of moving to use of another library for the de Bruijn graph (DBG) implementation used by the de novo discovery routine.
Previous discussion relating to the initial choice and integration of GATB for this role can be found at #16.
Reasons for initiating this discussion:
boost
. Part of this problem seems to have been that GATB expects a system-wide boost. In addition, there are boost files actually inside the GATB repository. Both of these issues have combined to mean that rather than building the boost dependencies withpandora
we require them to be system-wide..h5
files is limiting our ability to multi-thread the de novo routine Multiprocess/multithreadpandora map --discover
issue with GATB graph creation #195 . @leoisl has found a way of implementing this fix, but this will rely on us having a fork of GATB that we maintain. This is obviously not ideal.pandora
.A solution that has been proposed if moving to the use of McCortex. One added benefit here is @iqbal-lab is very familiar with this code. Pros/Cons of McCortex vs GATB is probably a good place to start this discussion.
The text was updated successfully, but these errors were encountered: