Skip to content

Commit

Permalink
Merge pull request #3 from auanasgheps/dev
Browse files Browse the repository at this point in the history
Merge code for 2.8 release
  • Loading branch information
Oliver Cervera authored Feb 8, 2021
2 parents 77f7be8 + 1b99f87 commit aed5848
Show file tree
Hide file tree
Showing 2 changed files with 87 additions and 62 deletions.
54 changes: 35 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,22 +16,41 @@ _This readme has some rough edges which will be smoothened over time._
# Highlights

## How it works
- After some preliminary checks, the script will execute `snapraid diff` to figure out if parity info is out of date, which means checking for changes since the last execution.
- After some preliminary checks, the script will execute `snapraid diff` to figure out if parity info is out of date, which means checking for changes since the last execution. During this step, the script will ensure drives are fine by reading parity and content files.
- One of the following will happen:
- If parity info is out of sync **and** the number of deleted or changed files exceed the threshold you have configured it **stops**. You may want to take a look to the output log.
- If parity info is out of sync **and** the number of deleted or changed files exceed the threshold, you can still **force a sync** after a number of warnings. It's useful If you often get a false alarm but you're confident enough.
- If parity info is out of sync **but** the number of deleted or changed files did not exceed the treshold, it **executes a sync** to update the parity info.
- When the parity info is in sync, either because nothing has changed or after a successfully sync, it runs the `snapraid scrub` command to validate the integrity of the data, both the files and the parity info. _Note that each run of the scrub command will validate only a configurable portion of parity info to avoid having a long running job and affecting the performance of the server._
- If parity info is out of sync **and** the number of deleted or changed files exceed the threshold, you can still **force a sync** after a number of warnings. It's useful If you often get a false alarm but you're confident enough. This is called "Sync with threshold warnings"
- If parity info is out of sync **but** the number of deleted or changed files did not exceed the threshold, it **executes a sync** to update the parity info.
- When the parity info is in sync, either because nothing has changed or after a successfully sync, it runs the `snapraid scrub` command to validate the integrity of the data, both the files and the parity info. If sync was cancelled or other issues were found, scrub will not be run. _Note that each run of the scrub command will validate only a configurable portion of parity info to avoid having a long running job and affecting the performance of the server._
- Extra information is be added, like SnapRAID's disk health report.
- When the script is done sends an email with the results, both in case of error or success.

Pre-hashing is enabled by default to avoid silent read errors. It mitigates the lack of ECC memory.
## Customization
Many options can be changed to your taste, their behavior is documented in the script config file.
If you don't know what to do, I recommend using the default values and see how it performs.

### Customizable features
- Sync options
- Sync always (forced sync)
- Sync after a number of breached threshold warnings
- Sync only if thresholds warnings are not breached (enabled by default)
- Thresholds for deleted and updated files
- Scrub options
- Enable or disable scrub
- Data to be scrubbed - by default 5% older than 10 days
- Pre-hashing - enabled by default to avoid silent read errors. It mitigates the lack of ECC memory.
- SMART Log - enabled by default, a SnapRAID report for disks health status
- Verbosity - disabled by default, does not include the TOUCH and DIFF output to have a better email
- Spindown - to spindown drives after the script, disabled because is currently not working
- Snapraid Status - show the status of the array, disabled because the report output is not rendered correctly


You can also change more advanced options such as mail binary (by default uses `mailx`), SnapRAID binary location, log file location.

## A nice email report
This report produces emails that don't contain a list of changed files to improve clarity.

You can re-enable full output in the email by switching the option `VERBOSITY` but the full report will always be available in `/tmp/snapRAID.out` and will be replaced after each run or deleted when the system is shut down if kept there.

SMART drive report from SnapRAID is also included by default.
You can re-enable full output in the email by switching the option `VERBOSITY` but the full report will always be available in `/tmp/snapRAID.out` but will be replaced after each run, or deleted when the system is shut down. You can change the location of the file, if needed.

Here's a sneak peek of the email report.

Expand Down Expand Up @@ -66,8 +85,9 @@ DIFF finished [Sat Jan 9 02:07:46 CET 2021]

**SUMMARY of changes - Added [2] - Deleted [0] - Moved [0] - Copied [0] - Updated [0]**

There are deleted files. The number of deleted files, (0), is below the threshold of (2). SYNC Authorized.
There are updated files. The number of updated files, (0), is below the threshold of (2). SYNC Authorized.
There are no deleted files, that's fine.
There are no updated files, that's fine.
SYNC is authorized.

### SnapRAID SYNC [Sat Jan 9 02:07:46 CET 2021]

Expand Down Expand Up @@ -157,16 +177,8 @@ All jobs ended. [Sat Jan 9 02:07:49 CET 2021]
Email address is set. Sending email report to [email protected] [Sat Jan 9 02:07:49 CET 2021]
```

## Customization
Many options can be changed to your taste, their behaviour is documented in the script config file.

If you don't know what to do, I recommend using the default values and see how it performs.

You can also change more advanced options such as mail binary (by default uses `mailx`), SnapRAID binary location, log file location.


# Requirements
- Markdown to have nice emails
- Markdown to have nice emails - will be installed if not found
- ~~Hd-idle to spin down disks - [Link TBD] - currently not required since spin down does not work properly.~~

# Installation
Expand All @@ -179,6 +191,10 @@ If you want to use this script on OMV, don't worry about the section _Diff Scrip
5. Tweak the config file if needed
6. Schedule the script execution time

It is tested on OMV5, but will work on other distros. In such case you may have to change the mail binary or SnapRAID location.

If you want to use this script on OMV, don't worry about the section _Diff Script Settings_ in the main page of the SnapRAID plugin, since it only applies to the built-in plugin script. Also don't forget to remove from scheduling the built-in script.

# Known Issues
- Hard disk spin down does not work: they are immediately woken up. The script probably does not handle this correctly while running.
- The report is not perfect, we can't be solve this because SnapRAID does not natively support Markdown.
Expand Down
95 changes: 52 additions & 43 deletions snapraid-aio-script.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@
#
# Project page: https://github.com/auanasgheps/snapraid-aio-script
#
SNAPSCRIPTVERSION="2.7"
########################################################################

######################
# USER VARIABLES #
# CONFIG VARIABLES #
######################
SNAPSCRIPTVERSION="2.8"

# find the current path
CURRENT_DIR="$(dirname "${0}")"
Expand Down Expand Up @@ -129,6 +129,7 @@ function main(){

# Now run sync if conditions are met
if [ $DO_SYNC -eq 1 ]; then
echo "SYNC is authorized. [`date`]"
echo "###SnapRAID SYNC [`date`]"
mklog "INFO: SnapRAID SYNC Job started"
if [ $PREHASH -eq 1 ]; then
Expand All @@ -143,7 +144,7 @@ function main(){
mklog "INFO: SnapRAID SYNC Job finished"
JOBS_DONE="$JOBS_DONE + SYNC"
# insert SYNC marker to 'Everything OK' or 'Nothing to do' string to differentiate it from SCRUB job later
sed_me "s/^Everything OK/SYNC_JOB--Everything OK/g;s/^Nothing to do/SYNC_JOB--Nothing to do/g" "$TMP_OUTPUT"
sed_me "s/^Everything OK/**SYNC JOB - Everything OK**/g;s/^Nothing to do/**SYNC JOB - Nothing to do**/g" "$TMP_OUTPUT"
# Remove any warning flags if set previously. This is done in this step to take care of scenarios when user
# has manually synced or restored deleted files and we will have missed it in the checks above.
if [ -e $SYNC_WARN_FILE ]; then
Expand All @@ -157,12 +158,13 @@ function main(){
# YES, first let's check if delete threshold has been breached and we have not forced a sync.
if [ $CHK_FAIL -eq 1 -a $DO_SYNC -eq 0 ]; then
# YES, parity is out of sync so let's not run scrub job
echo
echo "Scrub job is cancelled as parity info is out of sync (deleted or changed files threshold has been breached). [`date`]"
mklog "INFO: Scrub job is cancelled as parity info is out of sync (deleted or changed files threshold has been breached)."
else
# NO, delete threshold has not been breached OR we forced a sync, but we have one last test -
# let's make sure if sync ran, it completed successfully (by checking for our marker text "SYNC_JOB--" in the output).
if [ $DO_SYNC -eq 1 -a -z "$(grep -w "SYNC_JOB-" $TMP_OUTPUT)" ]; then
# let's make sure if sync ran, it completed successfully (by checking for our marker text "SYNC JOB -" in the output).
if [ $DO_SYNC -eq 1 -a -z "$(grep -w "SYNC JOB -" $TMP_OUTPUT)" ]; then
# Sync ran but did not complete successfully so lets not run scrub to be safe
echo "**WARNING** - check output of SYNC job. Could not detect marker. Not proceeding with SCRUB job. [`date`]"
mklog "WARN: Check output of SYNC job. Could not detect marker. Not proceeding with SCRUB job."
Expand All @@ -179,7 +181,7 @@ function main(){
echo
JOBS_DONE="$JOBS_DONE + SCRUB"
# insert SCRUB marker to 'Everything OK' or 'Nothing to do' string to differentiate it from SYNC job above
sed_me "s/^Everything OK/SCRUB_JOB--Everything OK/g;s/^Nothing to do/SCRUB_JOB--Nothing to do/g" "$TMP_OUTPUT"
sed_me "s/^Everything OK/**SCRUB JOB - Everything OK**/g;s/^Nothing to do/**SCRUB JOB - Nothing to do**/g" "$TMP_OUTPUT"
fi
fi
else
Expand All @@ -203,6 +205,7 @@ function main(){
# Show SnapRAID Status information if enabled
if [ $SNAP_STATUS -eq 1 ]; then
echo
echo "###SnapRAID Status"
$SNAPRAID_BIN status
close_output_and_wait
output_to_file_screen
Expand Down Expand Up @@ -231,7 +234,7 @@ function main(){
# do
# if [[ `smartctl -a /dev/$DRIVE | grep 'Rotation Rate' | grep rpm` ]]; then
# echo "spinning down /dev/$DRIVE"
# hd-idle -t $DRIVE
# hd-idle -t /dev/$DRIVE
# fi
# done
# fi
Expand Down Expand Up @@ -260,8 +263,6 @@ function main(){
fi
fi

#clean_desc

exit 0;
}

Expand Down Expand Up @@ -322,20 +323,29 @@ function sed_me(){

function chk_del(){
if [ $DEL_COUNT -lt $DEL_THRESHOLD ]; then
# NO, delete threshold not reached, lets run the sync job
echo "There are deleted files. The number of deleted files, ($DEL_COUNT), is below the threshold of ($DEL_THRESHOLD). SYNC Authorized."
if [ $DEL_COUNT -eq 0 ]; then
echo "There are no deleted files, that's fine."
DO_SYNC=1
else
echo "There are deleted files. The number of deleted files ($DEL_COUNT) is below the threshold of ($DEL_THRESHOLD)."
DO_SYNC=1
fi
else
echo "**WARNING** Deleted files ($DEL_COUNT) reached/exceeded threshold ($DEL_THRESHOLD)."
mklog "WARN: Deleted files ($DEL_COUNT) reached/exceeded threshold ($DEL_THRESHOLD)."
CHK_FAIL=1
fi
}
}

function chk_updated(){
if [ $UPDATE_COUNT -lt $UP_THRESHOLD ]; then
echo "There are updated files. The number of updated files, ($UPDATE_COUNT), is below the threshold of ($UP_THRESHOLD). SYNC Authorized."
if [ $UPDATE_COUNT -eq 0 ]; then
echo "There are no updated files, that's fine."
DO_SYNC=1
else
echo "There are updated files. The number of updated files ($UPDATE_COUNT) is below the threshold of ($UP_THRESHOLD)."
DO_SYNC=1
fi
else
echo "**WARNING** Updated files ($UPDATE_COUNT) reached/exceeded threshold ($UP_THRESHOLD)."
mklog "WARN: Updated files ($UPDATE_COUNT) reached/exceeded threshold ($UP_THRESHOLD)."
Expand All @@ -345,29 +355,45 @@ function chk_updated(){

function chk_sync_warn(){
if [ $SYNC_WARN_THRESHOLD -gt -1 ]; then
echo "Forced sync is enabled. [`date`]"
if [ $SYNC_WARN_THRESHOLD -eq 0 ]; then
echo "Forced sync is enabled."
mklog "INFO: Forced sync is enabled."

else
echo "Sync after threshold warning(s) is enabled."
mklog "INFO: Sync after threshold warning(s) is enabled."
fi
SYNC_WARN_COUNT=$(sed 'q;/^[0-9][0-9]*$/!d' $SYNC_WARN_FILE 2>/dev/null)
SYNC_WARN_COUNT=${SYNC_WARN_COUNT:-0} #value is zero if file does not exist or does not contain what we are expecting

if [ $SYNC_WARN_COUNT -ge $SYNC_WARN_THRESHOLD ]; then
# YES, lets force a sync job. Do not need to remove warning marker here as it is automatically removed when the sync job is run by this script
echo "Number of threshold warning(s) ($SYNC_WARN_COUNT) has reached/exceeded threshold ($SYNC_WARN_THRESHOLD). Forcing a SYNC job to run. [`date`]"
if [ $SYNC_WARN_COUNT -ge $SYNC_WARN_THRESHOLD ]; then
# force a sync
# if the warn count is zero it means the sync was already forced, do not output a dumb message and continue with the sync job.
if [ $SYNC_WARN_COUNT -eq 0 ]; then
echo
DO_SYNC=1
else
# if there is at least one warn count, output a message and force a sync job. Do not need to remove warning marker here as it is automatically removed when the sync job is run by this script
echo "Number of threshold warning(s) ($SYNC_WARN_COUNT) has reached/exceeded threshold ($SYNC_WARN_THRESHOLD). Forcing a SYNC job to run."
mklog "INFO: Number of threshold warning(s) ($SYNC_WARN_COUNT) has reached/exceeded threshold ($SYNC_WARN_THRESHOLD). Forcing a SYNC job to run."
DO_SYNC=1
fi
else
# NO, so let's increment the warning count and skip the sync job
((SYNC_WARN_COUNT += 1))
echo $SYNC_WARN_COUNT > $SYNC_WARN_FILE
echo "$((SYNC_WARN_THRESHOLD - SYNC_WARN_COUNT)) threshold warning(s) until the next forced sync. NOT proceeding with SYNC job. [`date`]"
mklog "INFO: $((SYNC_WARN_THRESHOLD - SYNC_WARN_COUNT)) threshold warning(s) until the next forced sync. NOT proceeding with SYNC job."
DO_SYNC=0
if [ $SYNC_WARN_COUNT == $SYNC_WARN_THRESHOLD ]; then
echo "This is the **last** warning left. **NOT** proceeding with SYNC job. [`date`]"
mklog "This is the **last** warning left. **NOT** proceeding with SYNC job. [`date`]"
DO_SYNC=0
else
echo "$((SYNC_WARN_THRESHOLD - SYNC_WARN_COUNT)) threshold warning(s) until the next forced sync. **NOT** proceeding with SYNC job. [`date`]"
mklog "INFO: $((SYNC_WARN_THRESHOLD - SYNC_WARN_COUNT)) threshold warning(s) until the next forced sync. **NOT** proceeding with SYNC job."
DO_SYNC=0
fi
fi
else
# NO, so let's skip SYNC
echo "Forced sync is not enabled. Check $TMP_OUTPUT for details. NOT proceeding with SYNC job. [`date`]"
mklog "INFO: Forced sync is not enabled. Check $TMP_OUTPUT for details. NOT proceeding with SYNC job."
echo "Forced sync is not enabled. Check $TMP_OUTPUT for details. **NOT** proceeding with SYNC job. [`date`]"
mklog "INFO: Forced sync is not enabled. Check $TMP_OUTPUT for details. **NOT** proceeding with SYNC job."
DO_SYNC=0
fi
}
Expand All @@ -389,23 +415,6 @@ function chk_zero(){
fi
}

function service_array_setup() {
if [ -z "$SERVICES" ]; then
echo "Please configure services"
else
echo "Setting up service array"
read -a service_array <<<$SERVICES
fi
}

function clean_desc(){
# Cleanup file descriptors
exec >&{out} 2>&{err}

# If interactive shell restore output
[[ $- == *i* ]] && exec &>/dev/tty
}

function prepare_mail() {
if [ $CHK_FAIL -eq 1 ]; then
if [ $DEL_COUNT -ge $DEL_THRESHOLD -a $DO_SYNC -eq 0 ]; then
Expand All @@ -432,10 +441,10 @@ function prepare_mail() {
MSG="Sync forced with multiple violations - Deleted files ($DEL_COUNT) / ($DEL_THRESHOLD) and changed files ($UPDATE_COUNT) / ($UP_THRESHOLD)"
fi
SUBJECT="[WARNING] $MSG $EMAIL_SUBJECT_PREFIX"
elif [ -z "${JOBS_DONE##*"SYNC"*}" -a -z "$(grep -w "SYNC_JOB-" $TMP_OUTPUT)" ]; then
elif [ -z "${JOBS_DONE##*"SYNC"*}" -a -z "$(grep -w "SYNC JOB -" $TMP_OUTPUT)" ]; then
# Sync ran but did not complete successfully so lets warn the user
SUBJECT="[WARNING] SYNC job ran but did not complete successfully $EMAIL_SUBJECT_PREFIX"
elif [ -z "${JOBS_DONE##*"SCRUB"*}" -a -z "$(grep -w "SCRUB_JOB-" $TMP_OUTPUT)" ]; then
elif [ -z "${JOBS_DONE##*"SCRUB"*}" -a -z "$(grep -w "SCRUB JOB -" $TMP_OUTPUT)" ]; then
# Scrub ran but did not complete successfully so lets warn the user
SUBJECT="[WARNING] SCRUB job ran but did not complete successfully $EMAIL_SUBJECT_PREFIX"
else
Expand Down

0 comments on commit aed5848

Please sign in to comment.