-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RIIR (rewrite it in Rust?!) possibility #35
Comments
When I originally wrote ssdeep, the goals were, in order of priority:
I understand that performance becomes a consideration in production, as a maintainer I believe it's less important than the other goals. With that said, however, I have no specific attachment to the code being in C/C++. If the time has come for us to move to a new technology to achieve these goals, so be it. I will be looking forward to seeing the new version! |
Hi @jessek, Yes, my first motivation rewriting ssdeep in Rust was 2. (easy to maintain). I consider 1. is already achieved (before 2. is satisfied). For instance, when the binary
My second consideration is 3. ... it works on "reasonably" various platforms. But I'm not sure that what I call "reasonably various" platforms are enough and that's why I'm requesting comments. In the first post, I emphasized the performance (4.) but... that's just because that is the most I surprised. Yes, safe Rust port works fast enough but unsafe Rust port (with LTO) was faster than my C++ port on my Zen 3 machine. I sometimes need to do large scale clustering involving 20-40M ssdeep hashes and to reduce the computing time from a few weeks to a few days matters. That's why I made ffuzzy++ and fast-ssdeep-clus, C++ port of libfuzzy with clustering-friendly APIs + previously in-house parallel clustering tools with performance in the first mind. I didn't expect that the unsafe Rust port could catch up with this. |
Hello everyone, I may not be the most important person to express my opinion, but as a long-time user of this program, I would like to express my gratitude to everyone who has contributed. This has made my job much easier, particularly in finding similar firmware binaries without any documentation or identification. I currently use a Python binding as I am not proficient in C and do not require ultra-performance. Although I cannot measure Python's performance impact, the Python binding does the job for me. However, I am interested in experimenting with Rust, and having a Rust version of the program would provide an excellent opportunity to delve into it further. I am looking forward to trying the Rust version of the program. Lastly, I am delighted to hear that this repository has not been forgotten, and I hope that everyone is doing well. Once again, thank you. |
For whom concern,
This is Tsukasa OI, a maintainer of ssdeep.
Sorry for not maintaining for a long time while I was busy on the job. I'm now reviewing the original C source code again and looking for some improvements. However, there is an issue (the major one): preserving portability in C is hard. Per-OS code spreads everywhere. Some tools / fragments are old and we don't even know what platform/tools to support.
(even if we don't rewrite it in Rust, we definitely need some cleaning)
Then, a Rust guy recommended me to try rewriting it in Rust. Well... (about 2 weeks later) the result looks... promising.
I ported libfuzzy and a part of ssdeep (CLI) to Rust and... it performs faster than libfuzzy when comparing fuzzy hashes, even if we don't use any
unsafe
blocks (on fuzzy hash generation, the safe Rust version was about 15% slower). With unsafe Rust, it's definitely faster than libfuzzy (both in comparison and hash generation) and surprisingly... it got faster than ffuzzy++, my C++ port of libfuzzy (generally faster than libfuzzy and has a specialized API for large scale clustering) when I enabled LTO build. I haven't implemented all features in ssdeep (CLI) but it seems more readable.In the process doing this, I found a bug inside fuzzy.c (I am struggling to find a failure test case because it seems very hard to reproduce) and will fix later (probably next week).
Anyway, back to Rust. It looks promising but I'm not sure whether this is the future we (as a project) should go. At least, we should discuss about it.
In a few weeks, I will release Rust port of the original ssdeep (at least, most features) and libfuzzy in my GitHub (not in ssdeep-project) and I would like to hear your thoughts.
Request for Comments
The text was updated successfully, but these errors were encountered: