diff --git a/README.md b/README.md index 99f9a48..62dd505 100644 --- a/README.md +++ b/README.md @@ -16,22 +16,41 @@ _This readme has some rough edges which will be smoothened over time._ # Highlights ## How it works -- After some preliminary checks, the script will execute `snapraid diff` to figure out if parity info is out of date, which means checking for changes since the last execution. +- After some preliminary checks, the script will execute `snapraid diff` to figure out if parity info is out of date, which means checking for changes since the last execution. During this step, the script will ensure drives are fine by reading parity and content files. - One of the following will happen: - If parity info is out of sync **and** the number of deleted or changed files exceed the threshold you have configured it **stops**. You may want to take a look to the output log. - - If parity info is out of sync **and** the number of deleted or changed files exceed the threshold, you can still **force a sync** after a number of warnings. It's useful If you often get a false alarm but you're confident enough. - - If parity info is out of sync **but** the number of deleted or changed files did not exceed the treshold, it **executes a sync** to update the parity info. -- When the parity info is in sync, either because nothing has changed or after a successfully sync, it runs the `snapraid scrub` command to validate the integrity of the data, both the files and the parity info. _Note that each run of the scrub command will validate only a configurable portion of parity info to avoid having a long running job and affecting the performance of the server._ + - If parity info is out of sync **and** the number of deleted or changed files exceed the threshold, you can still **force a sync** after a number of warnings. It's useful If you often get a false alarm but you're confident enough. This is called "Sync with threshold warnings" + - If parity info is out of sync **but** the number of deleted or changed files did not exceed the threshold, it **executes a sync** to update the parity info. +- When the parity info is in sync, either because nothing has changed or after a successfully sync, it runs the `snapraid scrub` command to validate the integrity of the data, both the files and the parity info. If sync was cancelled or other issues were found, scrub will not be run. _Note that each run of the scrub command will validate only a configurable portion of parity info to avoid having a long running job and affecting the performance of the server._ +- Extra information is be added, like SnapRAID's disk health report. - When the script is done sends an email with the results, both in case of error or success. -Pre-hashing is enabled by default to avoid silent read errors. It mitigates the lack of ECC memory. +## Customization +Many options can be changed to your taste, their behavior is documented in the script config file. +If you don't know what to do, I recommend using the default values and see how it performs. + +### Customizable features +- Sync options + - Sync always (forced sync) + - Sync after a number of breached threshold warnings + - Sync only if thresholds warnings are not breached (enabled by default) + - Thresholds for deleted and updated files +- Scrub options + - Enable or disable scrub + - Data to be scrubbed - by default 5% older than 10 days +- Pre-hashing - enabled by default to avoid silent read errors. It mitigates the lack of ECC memory. +- SMART Log - enabled by default, a SnapRAID report for disks health status +- Verbosity - disabled by default, does not include the TOUCH and DIFF output to have a better email +- Spindown - to spindown drives after the script, disabled because is currently not working +- Snapraid Status - show the status of the array, disabled because the report output is not rendered correctly + + +You can also change more advanced options such as mail binary (by default uses `mailx`), SnapRAID binary location, log file location. ## A nice email report This report produces emails that don't contain a list of changed files to improve clarity. -You can re-enable full output in the email by switching the option `VERBOSITY` but the full report will always be available in `/tmp/snapRAID.out` and will be replaced after each run or deleted when the system is shut down if kept there. - -SMART drive report from SnapRAID is also included by default. +You can re-enable full output in the email by switching the option `VERBOSITY` but the full report will always be available in `/tmp/snapRAID.out` but will be replaced after each run, or deleted when the system is shut down. You can change the location of the file, if needed. Here's a sneak peek of the email report. @@ -66,8 +85,9 @@ DIFF finished [Sat Jan 9 02:07:46 CET 2021] **SUMMARY of changes - Added [2] - Deleted [0] - Moved [0] - Copied [0] - Updated [0]** -There are deleted files. The number of deleted files, (0), is below the threshold of (2). SYNC Authorized. -There are updated files. The number of updated files, (0), is below the threshold of (2). SYNC Authorized. +There are no deleted files, that's fine. +There are no updated files, that's fine. +SYNC is authorized. ### SnapRAID SYNC [Sat Jan 9 02:07:46 CET 2021] @@ -157,16 +177,8 @@ All jobs ended. [Sat Jan 9 02:07:49 CET 2021] Email address is set. Sending email report to example@example.com [Sat Jan 9 02:07:49 CET 2021] ``` -## Customization -Many options can be changed to your taste, their behaviour is documented in the script config file. - -If you don't know what to do, I recommend using the default values and see how it performs. - -You can also change more advanced options such as mail binary (by default uses `mailx`), SnapRAID binary location, log file location. - - # Requirements -- Markdown to have nice emails +- Markdown to have nice emails - will be installed if not found - ~~Hd-idle to spin down disks - [Link TBD] - currently not required since spin down does not work properly.~~ # Installation @@ -179,6 +191,10 @@ If you want to use this script on OMV, don't worry about the section _Diff Scrip 5. Tweak the config file if needed 6. Schedule the script execution time +It is tested on OMV5, but will work on other distros. In such case you may have to change the mail binary or SnapRAID location. + +If you want to use this script on OMV, don't worry about the section _Diff Script Settings_ in the main page of the SnapRAID plugin, since it only applies to the built-in plugin script. Also don't forget to remove from scheduling the built-in script. + # Known Issues - Hard disk spin down does not work: they are immediately woken up. The script probably does not handle this correctly while running. - The report is not perfect, we can't be solve this because SnapRAID does not natively support Markdown. diff --git a/snapraid-aio-script.sh b/snapraid-aio-script.sh index 4c8834a..ca60dc3 100644 --- a/snapraid-aio-script.sh +++ b/snapraid-aio-script.sh @@ -3,12 +3,12 @@ # # Project page: https://github.com/auanasgheps/snapraid-aio-script # -SNAPSCRIPTVERSION="2.7" ######################################################################## ###################### -# USER VARIABLES # +# CONFIG VARIABLES # ###################### +SNAPSCRIPTVERSION="2.8" # find the current path CURRENT_DIR="$(dirname "${0}")" @@ -129,6 +129,7 @@ function main(){ # Now run sync if conditions are met if [ $DO_SYNC -eq 1 ]; then + echo "SYNC is authorized. [`date`]" echo "###SnapRAID SYNC [`date`]" mklog "INFO: SnapRAID SYNC Job started" if [ $PREHASH -eq 1 ]; then @@ -143,7 +144,7 @@ function main(){ mklog "INFO: SnapRAID SYNC Job finished" JOBS_DONE="$JOBS_DONE + SYNC" # insert SYNC marker to 'Everything OK' or 'Nothing to do' string to differentiate it from SCRUB job later - sed_me "s/^Everything OK/SYNC_JOB--Everything OK/g;s/^Nothing to do/SYNC_JOB--Nothing to do/g" "$TMP_OUTPUT" + sed_me "s/^Everything OK/**SYNC JOB - Everything OK**/g;s/^Nothing to do/**SYNC JOB - Nothing to do**/g" "$TMP_OUTPUT" # Remove any warning flags if set previously. This is done in this step to take care of scenarios when user # has manually synced or restored deleted files and we will have missed it in the checks above. if [ -e $SYNC_WARN_FILE ]; then @@ -157,12 +158,13 @@ function main(){ # YES, first let's check if delete threshold has been breached and we have not forced a sync. if [ $CHK_FAIL -eq 1 -a $DO_SYNC -eq 0 ]; then # YES, parity is out of sync so let's not run scrub job + echo echo "Scrub job is cancelled as parity info is out of sync (deleted or changed files threshold has been breached). [`date`]" mklog "INFO: Scrub job is cancelled as parity info is out of sync (deleted or changed files threshold has been breached)." else # NO, delete threshold has not been breached OR we forced a sync, but we have one last test - - # let's make sure if sync ran, it completed successfully (by checking for our marker text "SYNC_JOB--" in the output). - if [ $DO_SYNC -eq 1 -a -z "$(grep -w "SYNC_JOB-" $TMP_OUTPUT)" ]; then + # let's make sure if sync ran, it completed successfully (by checking for our marker text "SYNC JOB -" in the output). + if [ $DO_SYNC -eq 1 -a -z "$(grep -w "SYNC JOB -" $TMP_OUTPUT)" ]; then # Sync ran but did not complete successfully so lets not run scrub to be safe echo "**WARNING** - check output of SYNC job. Could not detect marker. Not proceeding with SCRUB job. [`date`]" mklog "WARN: Check output of SYNC job. Could not detect marker. Not proceeding with SCRUB job." @@ -179,7 +181,7 @@ function main(){ echo JOBS_DONE="$JOBS_DONE + SCRUB" # insert SCRUB marker to 'Everything OK' or 'Nothing to do' string to differentiate it from SYNC job above - sed_me "s/^Everything OK/SCRUB_JOB--Everything OK/g;s/^Nothing to do/SCRUB_JOB--Nothing to do/g" "$TMP_OUTPUT" + sed_me "s/^Everything OK/**SCRUB JOB - Everything OK**/g;s/^Nothing to do/**SCRUB JOB - Nothing to do**/g" "$TMP_OUTPUT" fi fi else @@ -203,6 +205,7 @@ function main(){ # Show SnapRAID Status information if enabled if [ $SNAP_STATUS -eq 1 ]; then echo + echo "###SnapRAID Status" $SNAPRAID_BIN status close_output_and_wait output_to_file_screen @@ -231,7 +234,7 @@ function main(){ # do # if [[ `smartctl -a /dev/$DRIVE | grep 'Rotation Rate' | grep rpm` ]]; then # echo "spinning down /dev/$DRIVE" - # hd-idle -t $DRIVE + # hd-idle -t /dev/$DRIVE # fi # done # fi @@ -260,8 +263,6 @@ function main(){ fi fi - #clean_desc - exit 0; } @@ -322,20 +323,29 @@ function sed_me(){ function chk_del(){ if [ $DEL_COUNT -lt $DEL_THRESHOLD ]; then - # NO, delete threshold not reached, lets run the sync job - echo "There are deleted files. The number of deleted files, ($DEL_COUNT), is below the threshold of ($DEL_THRESHOLD). SYNC Authorized." + if [ $DEL_COUNT -eq 0 ]; then + echo "There are no deleted files, that's fine." + DO_SYNC=1 + else + echo "There are deleted files. The number of deleted files ($DEL_COUNT) is below the threshold of ($DEL_THRESHOLD)." DO_SYNC=1 + fi else echo "**WARNING** Deleted files ($DEL_COUNT) reached/exceeded threshold ($DEL_THRESHOLD)." mklog "WARN: Deleted files ($DEL_COUNT) reached/exceeded threshold ($DEL_THRESHOLD)." CHK_FAIL=1 fi -} +} function chk_updated(){ if [ $UPDATE_COUNT -lt $UP_THRESHOLD ]; then - echo "There are updated files. The number of updated files, ($UPDATE_COUNT), is below the threshold of ($UP_THRESHOLD). SYNC Authorized." + if [ $UPDATE_COUNT -eq 0 ]; then + echo "There are no updated files, that's fine." + DO_SYNC=1 + else + echo "There are updated files. The number of updated files ($UPDATE_COUNT) is below the threshold of ($UP_THRESHOLD)." DO_SYNC=1 + fi else echo "**WARNING** Updated files ($UPDATE_COUNT) reached/exceeded threshold ($UP_THRESHOLD)." mklog "WARN: Updated files ($UPDATE_COUNT) reached/exceeded threshold ($UP_THRESHOLD)." @@ -345,29 +355,45 @@ function chk_updated(){ function chk_sync_warn(){ if [ $SYNC_WARN_THRESHOLD -gt -1 ]; then - echo "Forced sync is enabled. [`date`]" + if [ $SYNC_WARN_THRESHOLD -eq 0 ]; then + echo "Forced sync is enabled." mklog "INFO: Forced sync is enabled." - + else + echo "Sync after threshold warning(s) is enabled." + mklog "INFO: Sync after threshold warning(s) is enabled." + fi SYNC_WARN_COUNT=$(sed 'q;/^[0-9][0-9]*$/!d' $SYNC_WARN_FILE 2>/dev/null) SYNC_WARN_COUNT=${SYNC_WARN_COUNT:-0} #value is zero if file does not exist or does not contain what we are expecting - - if [ $SYNC_WARN_COUNT -ge $SYNC_WARN_THRESHOLD ]; then - # YES, lets force a sync job. Do not need to remove warning marker here as it is automatically removed when the sync job is run by this script - echo "Number of threshold warning(s) ($SYNC_WARN_COUNT) has reached/exceeded threshold ($SYNC_WARN_THRESHOLD). Forcing a SYNC job to run. [`date`]" + if [ $SYNC_WARN_COUNT -ge $SYNC_WARN_THRESHOLD ]; then + # force a sync + # if the warn count is zero it means the sync was already forced, do not output a dumb message and continue with the sync job. + if [ $SYNC_WARN_COUNT -eq 0 ]; then + echo + DO_SYNC=1 + else + # if there is at least one warn count, output a message and force a sync job. Do not need to remove warning marker here as it is automatically removed when the sync job is run by this script + echo "Number of threshold warning(s) ($SYNC_WARN_COUNT) has reached/exceeded threshold ($SYNC_WARN_THRESHOLD). Forcing a SYNC job to run." mklog "INFO: Number of threshold warning(s) ($SYNC_WARN_COUNT) has reached/exceeded threshold ($SYNC_WARN_THRESHOLD). Forcing a SYNC job to run." DO_SYNC=1 + fi else # NO, so let's increment the warning count and skip the sync job ((SYNC_WARN_COUNT += 1)) echo $SYNC_WARN_COUNT > $SYNC_WARN_FILE - echo "$((SYNC_WARN_THRESHOLD - SYNC_WARN_COUNT)) threshold warning(s) until the next forced sync. NOT proceeding with SYNC job. [`date`]" - mklog "INFO: $((SYNC_WARN_THRESHOLD - SYNC_WARN_COUNT)) threshold warning(s) until the next forced sync. NOT proceeding with SYNC job." - DO_SYNC=0 + if [ $SYNC_WARN_COUNT == $SYNC_WARN_THRESHOLD ]; then + echo "This is the **last** warning left. **NOT** proceeding with SYNC job. [`date`]" + mklog "This is the **last** warning left. **NOT** proceeding with SYNC job. [`date`]" + DO_SYNC=0 + else + echo "$((SYNC_WARN_THRESHOLD - SYNC_WARN_COUNT)) threshold warning(s) until the next forced sync. **NOT** proceeding with SYNC job. [`date`]" + mklog "INFO: $((SYNC_WARN_THRESHOLD - SYNC_WARN_COUNT)) threshold warning(s) until the next forced sync. **NOT** proceeding with SYNC job." + DO_SYNC=0 fi + fi else # NO, so let's skip SYNC - echo "Forced sync is not enabled. Check $TMP_OUTPUT for details. NOT proceeding with SYNC job. [`date`]" - mklog "INFO: Forced sync is not enabled. Check $TMP_OUTPUT for details. NOT proceeding with SYNC job." + echo "Forced sync is not enabled. Check $TMP_OUTPUT for details. **NOT** proceeding with SYNC job. [`date`]" + mklog "INFO: Forced sync is not enabled. Check $TMP_OUTPUT for details. **NOT** proceeding with SYNC job." DO_SYNC=0 fi } @@ -389,23 +415,6 @@ function chk_zero(){ fi } -function service_array_setup() { - if [ -z "$SERVICES" ]; then - echo "Please configure services" - else - echo "Setting up service array" - read -a service_array <<<$SERVICES - fi -} - -function clean_desc(){ - # Cleanup file descriptors - exec >&{out} 2>&{err} - - # If interactive shell restore output - [[ $- == *i* ]] && exec &>/dev/tty -} - function prepare_mail() { if [ $CHK_FAIL -eq 1 ]; then if [ $DEL_COUNT -ge $DEL_THRESHOLD -a $DO_SYNC -eq 0 ]; then @@ -432,10 +441,10 @@ function prepare_mail() { MSG="Sync forced with multiple violations - Deleted files ($DEL_COUNT) / ($DEL_THRESHOLD) and changed files ($UPDATE_COUNT) / ($UP_THRESHOLD)" fi SUBJECT="[WARNING] $MSG $EMAIL_SUBJECT_PREFIX" - elif [ -z "${JOBS_DONE##*"SYNC"*}" -a -z "$(grep -w "SYNC_JOB-" $TMP_OUTPUT)" ]; then + elif [ -z "${JOBS_DONE##*"SYNC"*}" -a -z "$(grep -w "SYNC JOB -" $TMP_OUTPUT)" ]; then # Sync ran but did not complete successfully so lets warn the user SUBJECT="[WARNING] SYNC job ran but did not complete successfully $EMAIL_SUBJECT_PREFIX" - elif [ -z "${JOBS_DONE##*"SCRUB"*}" -a -z "$(grep -w "SCRUB_JOB-" $TMP_OUTPUT)" ]; then + elif [ -z "${JOBS_DONE##*"SCRUB"*}" -a -z "$(grep -w "SCRUB JOB -" $TMP_OUTPUT)" ]; then # Scrub ran but did not complete successfully so lets warn the user SUBJECT="[WARNING] SCRUB job ran but did not complete successfully $EMAIL_SUBJECT_PREFIX" else