-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python bindings ? #12
Comments
very nice!! |
I did this for the needs of my internship, mentored by Pierre Peterlongo. The wrapper covers a minimal part of Whith time, the interface gets more dogfooding and I'll probably release it next week. @pgdurand wants to talk about it before. I've not yet decided if I should release in a standalone repository or within gatb-core. I use this cmake module for compiling the Cython module. Some remarks about integration in gatb-core :
Anyway, the scope of the python module should be redefined for other uses. Here a short list of what might be wanted :
I won't have the time to implement those, but the scope of the interface might define how the wrapper is integrated in the project and it's build process. |
I've just published the wrapper : here is the repository. |
do you mean header-only library? |
Yes. I must say, I don't quite like the fact that it's not possible to opt-out the visitor based interface in GraphTemplate. I would like to visit the GraphDataVariant once and run my (specialized template) algorithms from there. Instead the graph API "pattern match" the variant for each nucleotide traversed. Callgrind show that's about 1/3 of the cycles are spent visiting variants and copying GraphVectors. To my understanding Maybe I'm missing something ? The The Graph.cpp contains lots of duplicated code. It could leverage more of kmer::model iterators such that the code would be small enough to be header-only. This would enable RVO/Copy-elision for most cases. GraphVector could be replaced with an iterator interface. Specialized iterators/cursors could be made for some applications : for example, when drawing simple paths, the iterator could remember the previous node and spare one contains() call when counting predecessors. Sorry I'm a bit demanding on zero-cost abstractions since I learnt Rust :) |
zero-cost abstractions would be great :) |
thanks for the extensive response. Yes, you were missing something regarding the GraphTemplate instantiated with variants. The alternative is instantiation without the variant type, but instead with a single kmer size, i.e. directly passing a single templated integer to the GraphUnitigsTemplate class. The two projects that implement this alternative are |
We'd be happy to have a header-only library, and in fact, refactor gatb-core. Would you be interested in being involved? |
Thanks for the answer. I didn't know that GraphUnitigs had a better interface. The trio NodeFast/EdgeFast/GraphDataVariantFast still uses variants, right ? Maybe single alternative variants get optimized away ?
I'm thinking about a cursor-like API. For example a base abstract class can be derived for each span-specialization. The key point is avoiding handing kmer back and forth to user code. Since kmers are of variable size, it complexifies the API. Kmers are often treated as black boxes in client code. Instead, users should get nucleotides, simple-paths, unitigs and be able to navigate in the graph. Maybe a more efficient alternative would be specializing user template code for each span with a visitor pattern (but the whole user's algorithm is run inside one visit). About the cursor API, it could allow some optimizations:
Ideally the user should be allowed to compose templates for choosing features (like the STL algorithms). On a smaller scale, I rewrote GraphVector taking advantage of the recent addition of std=c++11. |
GraphUnitigs has the same interface as GraphTemplate, actually. NodeFast/EdgeFast/GraphDataVariantFast don't use boost::variants, they are just a regular data type. The trick is that the program needs to instantiate them based on the k-mer size given at command-line parameter. Thus, instead of checking that k-mer size each time a graph function is called, it is only checked once, in the main(). A typical program doesn't use variable k-mer sizes that differ by more than +-1. Thus it is okay to instantiate once and for all a graph template that supports one desired maximal k-mer size. This is a helpful constraint that would simplify the design of the library. I don't know what cursor-like APIs are, would you have any good pointer to some documentation? By the way nice job with the GraphVector rewrite! I'd be happy to take a pull request of all your changes at some point. |
You're right NodeFast and EdgeFast wrap plain LargeInt. But GraphTemplate<NodeFast,EdgeFast,GraphDataVariantFast> still use the visitor pattern and in src/gatb/debruijn/impl/Graph.hpp we have:
Say I have a algorithm : template<size_t span> class MyAlgo<span> {
public: void operator() (GraphUnitig<span>&);
}; How can i run it on an h5 input file that can be configured for any supported kmer size ? I need some sort of visitor pattern to specialize my code for each possible span, right ? What I mean by cursor-like is an iterator really. Only there might be multiples choices of successors when calling |
Ah right yes, I guess in order to keep compatibility with the rest of the code in Graph.cpp, GraphDataVariantFast had to be kept as a boost::variant. Regarding your toy example,yes it seems that the only way to run this code is to have it compiled for as many span's values as needed, and the relevant one will be selectd at runtime given a user-specified k-mer size. I.e. as it is right now in GATB. Sorry, I still didn't understand the iterator/cursor-like design. What successors are you talking about? |
Is there someone working on this ?
The only reference I found is on the GATB Global Architecture page :
I needed some basic scripting capabilities for exploring GATB's graphs so I wrote a simple Cython wrapper around
gatb::core::debruijn::impl::Graph
. Here is a demo. Code will be available soon.Suggestions or questions are welcomed.
The text was updated successfully, but these errors were encountered: