An unsane application for huge text files deduplication (keeping their order) and with batteries included.
Take a look to my GitHub profile.
- Multi-thread capable for faster results.
- STDIN reading or multiple files support.
# using 8 CPU cores and just one file as input (result is output to STDOUT)
fastdeduper -t 8 -f file1.txt
# reading content from STDIN
cat file1.txt | fastdeduper -t 4 -f -
- cmake: Used for building the project.
- libboost: Required for some of the project's functionality.
sudo apt-get update
sudo apt-get install -y libboost-all-dev cmake
-
Clone the repository:
git clone https://github.com/havocesp/fastdedupe cd fastdedupe make -j8 # 8 is the number of CPU cores to be use by make make install
- 07-10-2023: Initial release.
- Added multi-threading support for faster deduplication.
- Improved STDIN handling and support for multiple input files.
- Show any kind of progress during deduplication.
- Implement regular expressions filtering for advanced deduplication.