-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch to cortex for de Bruijn graph implementation #235
Conversation
Does anyone foresee an issue with me using |
Depends how fast unordered map is. Cortexs hash is pretty fast. |
We can provide a different hash function for I suspect @leoisl will have a better sense than myself as to the speed of an |
I usually prefer to use data structures in the standard library for some reasons:
In principle, I think that cortex On the other hand, I would usually prefer to use I see also that we are directly changing cortex code. I would have a preference to put cortex in our codebase as a git submodule, and make a local C++ wrapper around it. Like this, we would know exactly which commit/version we are working on, and if there are any updates to cortex, we can easily update cortex in our code base. This could happen if pandora needs a specific thing in the future that cortex don't actually implement now, or some kind of optimisation. If we put a cortex copy directly in our codebase, it is hard to know at which commit we started working on, and it will be hard to incorporate new changes. Also, I think the cortex version we are working on is a well-tested and stable version, so it is something we can trust. But changing the code directly means that we will need to add test code to cortex to be sure we did not introduce any bugs. I would vote for a C++ wrapper, as it is more modular, and we would keep this stable cortex code intact. Then we just need to test the wrapper, which is easy by mocking the cortex library. |
Alright. Fair points. I will delete all of this work and start again from a git submodule tomorrow. |
Oh, if it is needed to restart all from scratch, I am not sure if it is worth it... do you think you can move most of what you've done to a wrapper? If not, if it is indeed needed to restart all from scratch, I'd rather leave this decision to you and @iqbal-lab |
No, I can't really move what I've done into a wrapper as I was manually copying over the cortex code and "modernising/C++-ising" it at the same time. Better to restart now, rather than 2 weeks down the track. I'll include it as a submodule and write a wrapper that calls the relevant C code for various operations. |
Ok, just to be sure, my opinions on that long post were just from engineering POV. There are other factors to consider. If updates to cortex are a very rare event, then I think we can simply change it directly, and add test code on our additions (on the hope that we almost never will need to add new commits from the cortex repo to our modified local copy). This engineering choice will also just improve some code modularity/testability, but the end result will probably be the same. I am not sure is worth you to invest your time more on the engineering aspect, or continue the work you've done to explore your research question earlier. I did not enter these questions in my previous post, as I think it this is more of a management aspect, which you and @iqbal-lab should decide |
cortex updates essentially never and particularly not that code. fixes are sometimes need to the perl scripts. |
I'm going to close this and start fresh. |
The ultimate aim of this PR is to close #204 and make the implementation of #163 / #195 easier.
This is in an extremely draft phase. I have opened this PR to allow for gradual code review as I still feel quite awkward in C/C++ - especially trying to port C code to C++. It's kind of like I'm vision-impaired but am acting as my own guide dog 😅. But I don't want to use that as an excuse so am hoping to learn a lot about both languages in the process of this (while wishing pandora was in Rust the whole time).
So please give me lots of feedback/tips as I go @leoisl and @iqbal-lab