From 77baabf6d1403ceab255e73502af22a1ad5cef55 Mon Sep 17 00:00:00 2001 From: jooleer <60648388+jooleer@users.noreply.github.com> Date: Wed, 10 May 2023 18:49:27 +0700 Subject: [PATCH 1/5] Update README.md --- README.md | 42 +++++++++++++++++++++++++++++++++++++++++- 1 file changed, 41 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 0427a22..8f326cf 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,42 @@ -# folder-hash-compare +# Folder Hash Compare Compares hash values for 2 directories + +#### Video Demo: tbd + +## Description: + +Folder Hash Compare generates hashes for all files in 2 directories and then compares those hashes against eachother. + +# Installation/Requirements: + +Can be run as-is with python 3.x + +# Usage: + +FHC can be run with several parameters: +`folder_hash_compare.py [-h] [-p PRIMARY] [-s SECONDARY] [-d] [-m] [-n] [-v] [-c]` +``` +-h, --help shows help message and exits +-p PRIMARY, --primary PRIMARY path of primary directory f.e. -p "/home/user/dir1" or -p "C:\folder1" +-s SECONDARY, --secondary SECONDARY path of secondary directory f.e. -p "/home/user/dir2" or -p "D:\folder2" +-d, --disable disabled multithreading, when disabled the hashing will be done sequentially, by default they will be done simultaniously +-m, --missing searches for missing files in secondary directory (i.e. present in PRIMARY but not present in SECONDARY) +-n, --nmissing searches for missing files in primary directory (i.e. present in SECONDARY but not present in PRIMARY) +-v, --verbose enables verbose logging, outputs all steps in terminal +-l, --logging enables logging to txt file in logs/ folder +-c, --custom disables use of -p and -s parameters and allows to set hardcoded directory paths (for jobs that have to be done frequently with the same paths) +``` + + + +# Sources: + + + + +# Final notes: +I made Folder Hash Compare because there wasn't a program that suited my needs and worked cross-platform. After backing up a large amount of data to an external source I had some trouble finding a solution to make sure that all files were copied correctly. FHC started as a small script to quickly check folders but I added several functions and options (multithreading, enabling and disabling features) that other solutions didn't provide. + +
+_This software was created for educational purposes for my final project for CS50P and is licensed under the [MIT License](https://github.com/jooleer/folder-hash-compare/blob/main/LICENSE)._ + From c57552e566a622b37c32d9ba1756d2ca3a8ef071 Mon Sep 17 00:00:00 2001 From: jooleer <60648388+jooleer@users.noreply.github.com> Date: Wed, 10 May 2023 18:50:01 +0700 Subject: [PATCH 2/5] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 8f326cf..8579f3a 100644 --- a/README.md +++ b/README.md @@ -14,7 +14,7 @@ Can be run as-is with python 3.x # Usage: FHC can be run with several parameters: -`folder_hash_compare.py [-h] [-p PRIMARY] [-s SECONDARY] [-d] [-m] [-n] [-v] [-c]` +`folder_hash_compare.py [-h] [-p PRIMARY] [-s SECONDARY] [-d] [-m] [-n] [-v] [-l] [-c]` ``` -h, --help shows help message and exits -p PRIMARY, --primary PRIMARY path of primary directory f.e. -p "/home/user/dir1" or -p "C:\folder1" From 4385eb0567b1688303f45a27e3f79ca1fc1b29f0 Mon Sep 17 00:00:00 2001 From: jooleer <60648388+jooleer@users.noreply.github.com> Date: Wed, 10 May 2023 19:54:46 +0700 Subject: [PATCH 3/5] Update README.md --- README.md | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index 8579f3a..59fa106 100644 --- a/README.md +++ b/README.md @@ -19,21 +19,17 @@ FHC can be run with several parameters: -h, --help shows help message and exits -p PRIMARY, --primary PRIMARY path of primary directory f.e. -p "/home/user/dir1" or -p "C:\folder1" -s SECONDARY, --secondary SECONDARY path of secondary directory f.e. -p "/home/user/dir2" or -p "D:\folder2" +-a, --algorithm set algorithm to CRC32, MD5 or SHA256 (CRC32 by default) -d, --disable disabled multithreading, when disabled the hashing will be done sequentially, by default they will be done simultaniously -m, --missing searches for missing files in secondary directory (i.e. present in PRIMARY but not present in SECONDARY) -n, --nmissing searches for missing files in primary directory (i.e. present in SECONDARY but not present in PRIMARY) -v, --verbose enables verbose logging, outputs all steps in terminal --l, --logging enables logging to txt file in logs/ folder +-l, --logging disables logging to txt file in logs/ folder -c, --custom disables use of -p and -s parameters and allows to set hardcoded directory paths (for jobs that have to be done frequently with the same paths) ``` -# Sources: - - - - # Final notes: I made Folder Hash Compare because there wasn't a program that suited my needs and worked cross-platform. After backing up a large amount of data to an external source I had some trouble finding a solution to make sure that all files were copied correctly. FHC started as a small script to quickly check folders but I added several functions and options (multithreading, enabling and disabling features) that other solutions didn't provide. From a2685f03c7331f126bdbc5936700ac742a708ead Mon Sep 17 00:00:00 2001 From: jooleer <60648388+jooleer@users.noreply.github.com> Date: Wed, 10 May 2023 20:14:10 +0700 Subject: [PATCH 4/5] Update README.md --- README.md | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 59fa106..cb40302 100644 --- a/README.md +++ b/README.md @@ -20,7 +20,7 @@ FHC can be run with several parameters: -p PRIMARY, --primary PRIMARY path of primary directory f.e. -p "/home/user/dir1" or -p "C:\folder1" -s SECONDARY, --secondary SECONDARY path of secondary directory f.e. -p "/home/user/dir2" or -p "D:\folder2" -a, --algorithm set algorithm to CRC32, MD5 or SHA256 (CRC32 by default) --d, --disable disabled multithreading, when disabled the hashing will be done sequentially, by default they will be done simultaniously +-d, --disable disabled multithreading, when disabled the hashing will be done sequentially, by default they will be done simultaneously -m, --missing searches for missing files in secondary directory (i.e. present in PRIMARY but not present in SECONDARY) -n, --nmissing searches for missing files in primary directory (i.e. present in SECONDARY but not present in PRIMARY) -v, --verbose enables verbose logging, outputs all steps in terminal @@ -28,6 +28,20 @@ FHC can be run with several parameters: -c, --custom disables use of -p and -s parameters and allows to set hardcoded directory paths (for jobs that have to be done frequently with the same paths) ``` +`-p PRIMARY` and `-s SECONDARY` are not required when using `-c`, when using the `-c, --custom` parameter, make sure to fill in the `primary_directory` and `secondary_directory` variables in `folder_hash_compare.py`. + +`-p PRIMARY` and `-s SECONDARY` can be any path starting from the root to deeper directories. Directories can only be scanned and processed granted the user has access to them. + +`-a, --algorithm` allows the user to change the default algorithm to any of the 3 available ones: CRC32, MD5 or SHA256. CRC32 is faster but not secure, MD5 is slower than CRC32 but faster than SHA256 but is nowadays considered insecure. SHA256 is slower than both CRC32 and MD5 but is also more secure than either of them. For the purposes of this program I didn't feel the need to have a higher than 256-bit algorithm as it's generally just to compare if a directories' contents copied without errors to another one. + +`-d, --disable` disables multithreading. By default multithreading is enabled but if comparing 2 directories that are on the same drive it might be faster to have multithreading disabled. When disabled files will be hashes sequentially, starting from the primary directory and then processing the secondary directory. When multithreading is enabled file hashes are generated simultaneously. + +`-l, --logging` is recomendded to turn on, especially when you expect to encounter missing files (also see `-m, --missing` & `-n, --nmissing` below) + +`-m, --missing` and `-n, --nmissing` search for missing files. `-m` will report missing files in the secondary directory, i.e. files that are present in the PRIMARY directory but missing in the SECONDARY directory. `-n` will search for missing files the other way around, i.e. files that are present in the SECONDARY directory but missing in the PRIMARY directory. Recommended to use `-l, --logging` when using either of these settings. + +`-v, --verbose` displays a verbose logging output as the program runs, notifying the user of each step. + # Final notes: From 8f7ca78c67b4f1c64e4fa1f60b141983e7b9080b Mon Sep 17 00:00:00 2001 From: jooleer <60648388+jooleer@users.noreply.github.com> Date: Wed, 10 May 2023 20:15:57 +0700 Subject: [PATCH 5/5] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index cb40302..8419010 100644 --- a/README.md +++ b/README.md @@ -32,7 +32,7 @@ FHC can be run with several parameters: `-p PRIMARY` and `-s SECONDARY` can be any path starting from the root to deeper directories. Directories can only be scanned and processed granted the user has access to them. -`-a, --algorithm` allows the user to change the default algorithm to any of the 3 available ones: CRC32, MD5 or SHA256. CRC32 is faster but not secure, MD5 is slower than CRC32 but faster than SHA256 but is nowadays considered insecure. SHA256 is slower than both CRC32 and MD5 but is also more secure than either of them. For the purposes of this program I didn't feel the need to have a higher than 256-bit algorithm as it's generally just to compare if a directories' contents copied without errors to another one. +`-a, --algorithm` allows the user to change the default algorithm to any of the 3 available ones: __CRC32__, __MD5__ or __SHA256__. __CRC32__ is faster but not secure, __MD5__ is slower than __CRC32__ but faster than __SHA256__ but is nowadays considered insecure. __SHA256__ is slower than both __CRC32__ and __MD5__ but is also more secure than either of them. For the purposes of this program I didn't feel the need to have a higher than 256-bit algorithm as it's generally just to verify if a directories' contents copied without errors to another one. `-d, --disable` disables multithreading. By default multithreading is enabled but if comparing 2 directories that are on the same drive it might be faster to have multithreading disabled. When disabled files will be hashes sequentially, starting from the primary directory and then processing the secondary directory. When multithreading is enabled file hashes are generated simultaneously.