Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User config not working properly with Total_uncorrected_write_errors (and other write metrics potentially) #104

Open
Syphdias opened this issue Apr 14, 2022 · 2 comments

Comments

@Syphdias
Copy link

I use the check_smart_attributes from the current HEAD. I noticed that it was not possible to change the thresholds of "Total_uncorrected_write_errors".
I suspect this to be faulty for all write metrics for "Generic SAS" devices since the +7 logic is absent from the ucfg parser.

Example:

Version

# md5sum ./check_smart_attributes
d458a3316b5fe07db8bbaf938a7152c6  ./check_smart_attributes

Currently at 7 Total uncorrected write errors

# smartctl -a /dev/sdb |grep uncorrected -3

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0       25         0        83    6949314    1752293,432           0
write:         0      315         0       466    1291036     184008,911           7
# ./check_smart_attributes -db check_smartdb.json -d /dev/sdb
Critical (sdb) [sdb_Total_uncorrected_write_errors = Critical]|'sdb_Read_errors_corrected_by_ECC_fast'=0 'sdb_Write_errors_corrected_by_ECC_fast'=0 'sdb_Read_errors_corrected_by_ECC_delayed'=25 'sdb_Write_errors_corrected_by_ECC_delayed'=315 'sdb_Read_errors_corrected_by_ECC_reread'=0 'sdb_Write_errors_corrected_by_ECC_rewrite'=0 'sdb_Total_read_errors_corrected'=83 'sdb_Total_write_errors_corrected'=466 'sdb_Total_read_correction_algorithm_invocations'=6949314 'sdb_Total_write_correction_algorithm_invocations'=1291037 'sdb_Total_read_gigabytes_processed'=1752293,433 'sdb_Total_write_gigabytes_processed'=184008,916 'sdb_Total_uncorrected_read_errors'=0;0;0 'sdb_Total_uncorrected_write_errors'=7;0;0

Try to "mute" with Total uncorrectable write errors/14

# cat should_work_but_does_not.json 
{
	"Devices" : {
		"/dev/sdb" : {
			"Threshs" : {
				"14" : ["7","7"]
			}
		}
	}
}
# ./check_smart_attributes -db check_smartdb.json -ucfgj should_work_but_does_not.json -d /dev/sdb
Critical (sdb) [sdb_Total_uncorrected_write_errors = Critical]|'sdb_Read_errors_corrected_by_ECC_fast'=0 'sdb_Write_errors_corrected_by_ECC_fast'=0 'sdb_Read_errors_corrected_by_ECC_delayed'=25 'sdb_Write_errors_corrected_by_ECC_delayed'=315 'sdb_Read_errors_corrected_by_ECC_reread'=0 'sdb_Write_errors_corrected_by_ECC_rewrite'=0 'sdb_Total_read_errors_corrected'=83 'sdb_Total_write_errors_corrected'=466 'sdb_Total_read_correction_algorithm_invocations'=6949314 'sdb_Total_write_correction_algorithm_invocations'=1291037 'sdb_Total_read_gigabytes_processed'=1752293,433 'sdb_Total_write_gigabytes_processed'=184008,942 'sdb_Total_uncorrected_read_errors'=0;0;0 'sdb_Total_uncorrected_write_errors'=7;0;0

Try to "mute" with Total uncorrectable read errors/7

# cat should_not_work_but_does.json 
{
	"Devices" : {
		"/dev/sdb" : {
			"Threshs" : {
				"7" : ["7","7"]
			}
		}
	}
}
# ./check_smart_attributes -db check_smartdb.json -ucfgj should_not_work_but_does.json -d /dev/sdb
OK (sdb) |'sdb_Read_errors_corrected_by_ECC_fast'=0 'sdb_Write_errors_corrected_by_ECC_fast'=0 'sdb_Read_errors_corrected_by_ECC_delayed'=25 'sdb_Write_errors_corrected_by_ECC_delayed'=315 'sdb_Read_errors_corrected_by_ECC_reread'=0 'sdb_Write_errors_corrected_by_ECC_rewrite'=0 'sdb_Total_read_errors_corrected'=83 'sdb_Total_write_errors_corrected'=466 'sdb_Total_read_correction_algorithm_invocations'=6949314 'sdb_Total_write_correction_algorithm_invocations'=1291037 'sdb_Total_read_gigabytes_processed'=1752293,434 'sdb_Total_write_gigabytes_processed'=184008,945 'sdb_Total_uncorrected_read_errors'=0;7;7 'sdb_Total_uncorrected_write_errors'=7;7;7
@gschoenberger
Copy link
Member

Can you post the complete smartctl -a output also? The I can debug the plugin call...

@Syphdias
Copy link
Author

Syphdias commented Apr 14, 2022

# smartctl -a /dev/sdb
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-11-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               HGST
Product:              HUH721010AL5200
Revision:             LS17
Compliance:           SPC-4
User Capacity:        9.796.820.402.176 bytes [9,79 TB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
Formatted with type 2 protection
8 bytes of protection information per logical block
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000cca266ab0874
Serial number:        
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Thu Apr 14 21:57:37 2022 CEST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Disabled or Not Supported

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Grown defects during certification <not available>
Total blocks reassigned during format <not available>
Total new blocks reassigned <not available>
Power on minutes since format <not available>
Current Drive Temperature:     32 C
Drive Trip Temperature:        50 C

Accumulated power on time, hours:minutes 39564:06
Manufactured in week 33 of year 2017
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  20
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  1677
Elements in grown defect list: 13

Vendor (Seagate Cache) information
  Blocks sent to initiator = 28747207242940416

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0       25         0        83    6949491    1752296,159           0
write:         0      315         0       466    1291129     184012,685           7
verify:        0        0         0         0      48480          0,547           0

Non-medium error count:        0

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                  96       3                 - [-   -    -]
# 2  Reserved(7)       Completed                  64       3                 - [-   -    -]

Long (extended) Self-test duration: 65535 seconds [1092,2 minutes]

@Syphdias Syphdias changed the title Wrong ucfg not working properly with Total_uncorrected_write_errors (and other write metrics) User config not working properly with Total_uncorrected_write_errors (and other write metrics potentially) Apr 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants