Deduplication / Reduplication

Deduplication / Reduplication of file in Java.

General

Data deduplication refers to a technique for eliminating redundant data in a data set. In the process of deduplication, extra copies of the same data are deleted, leaving only one copy to be stored. Data is analyzed to identify duplicate byte patterns to ensure the single instance is indeed the single file. Then, duplicates are replaced with a reference that points to the stored chunk.

Pick the size of a chunk, which is the granularity (e.g., 1KB) at which you want to deduplicate a file. Inspect every non-overlapping chunk in the file (e.g., 0th KB, 1st KB, ..., 99th KB), and identify the unique ones. For each unique chunk, take note of where they are found in the file (e.g., 0, 1, 2, ..., 99).

Program

In this program, there are two prime methods dedup() and redup() defined. Both method accepts input and output absolute file locations. The size of the chunk by default is 1Kb. Here checksum is used to verify if the redup and dedup files are exactly the same as expected.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
DeReDup.java		DeReDup.java
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deduplication / Reduplication

General

Program

About

Releases

Packages

Languages

License

kushubham9/dedup_redup

Folders and files

Latest commit

History

Repository files navigation

Deduplication / Reduplication

General

Program

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages