Replacing: READ, WRITE and CHECKSUM errors #12973
-
Hello everyone. I hope this is the right place to ask my question. I'm truely sorry if it isn't. I have a problem replacing a disk in my ZFS-pool. PrefaceI have a self-built HomeServer running Proxmox 7.1-8 with four 6TB-drives in a zpool as storage. My zpool marked one of these drives as FAULTED (too many errors). Smartctl saw errors and a non-destructive badblocks run showed a few READ errors as well. Of course I have a backup :)
The journey beginsAs you can see, the faulty one's S/N ends with P9FX. At that time I thought it quite easy: Swap the devices and run a "zpool replace ". So that is what I did:
errorsHowever, after a few hours of resilvering some WRITE errors appeared (18 to be exact) and zpool sent me an email that the device was now as well marked as FAULTED (which is a great feature, btw). I let the process finish and investigated these strange WRITE errors. DMESG told me about "Logical block address out of range". So to be sure I used a different SATA-Cable and different PSU-Powerplug for the now starting second resilvering. 2nd resilveringI swapped the devices back and added the K80D device to my server. Now I had five drives connected, the pool, however, still only consisted of the four original drives. I didn't add the K80D device to the pool, but again started a zpool replace only now without a missing P9FX drive but with a FAULTED one still attached to the pool. Having waited for that resilvering to complete, now my pool looks like this.
There are READ errors on the P9FX device, which I deem okay, because exactly this caused all the swapping in the first place. Well, but the WRITE errors on the K80D device are a mystery to me. DMESG looks like this:
And here is smartctl P9FX
smartcl K80D
What to do?Now I'm kind of stuck. My pool functions, but is degraded and I cannot resilver it because the new device faults as well. I'm considering running a destructive badblocks on the new device. Another idea that crossed my mind: Should I try connecting the K80D device via USB and see if it has to do with SATA? Could you give me a hint on how to proceed? Thanks in advance! Your help is really appreciated. Best regards. EDIT: I just read about the IDNF errors in the SMART--report of the K80D device. Those seem to be connected to the device being an SMR-drive. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 2 replies
-
Let's see what others suggest. My guess is to check PSU. You added more
load than before and this can cause unexpected problems.
…On Fri, Jan 14, 2022 at 7:27 PM b0wtie ***@***.***> wrote:
Hello everyone. I hope this is the right place to ask my question. I'm
truely sorry if it isn't. I have a problem replacing a disk in my ZFS-pool.
Preface
I have a self-built HomeServer running Proxmox 7.1-8 with four 6TB-drives
in a zpool as storage.
The Mainboard is a "Fujitsu D3417-B1" with the latest BIOS and it is
equipped with 32 GB of ECC-RAM.
My zpool marked one of these drives as FAULTED (too many errors). Smartctl
saw errors and a non-destructive badblocks run showed a few READ errors as
well. Of course I have a backup :)
Since the device has 50.000 hours of on-time I thought it wise to replace
the device. Unfortunately I did not copy every output of every command I
ran in a file to present it here, but this is how the devices in the pool
looked before the swap:
ata-WDC_WD60EFAX-68JH4N1_WD-WX32DB08A24F ONLINE 0 0 0
ata-WDC_WD60EFRX-68L0BN1_WD-WXB1HB4LP9FX ONLINE 0 0 1
ata-WDC_WD60EFAX-68SHWN0_WD-WX21D2905V9H ONLINE 0 0 0
ata-WDC_WD60EFAX-68SHWN0_WD-WX31D399DXY2 ONLINE 0 0 0
The journey begins
As you can see, the faulty one's S/N ends with P9FX. At that time I
thought it quite easy: Swap the devices and run a "zpool replace ". So that
is what I did:
The new device's ID is ata-WDC_WD60EFAX-68JH4N1_WD-WX42D611K80D. Let's
call her "K80D".
I ran zpool replace datacenter 16943683573656154046
ata-WDC_WD60EFAX-68JH4N1_WD-WX42D611K80D on the pool.
NAME STATE READ WRITE CKSUM
datacenter DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
ata-WDC_WD60EFAX-68JH4N1_WD-WX32DB08A24F ONLINE 0 0 0
16943683573656154046 UNAVAIL 0 0 0
was /dev/disk/by-id/ata-WDC_WD60EFRX-68L0BN1_WD-WXB1HB4LP9FX-part1
ata-WDC_WD60EFAX-68SHWN0_WD-WX21D2905V9H ONLINE 0 0 0
ata-WDC_WD60EFAX-68SHWN0_WD-WX31D399DXY2 ONLINE 0 0 0
errors
However, after a few hours of resilvering some WRITE errors appeared (18
to be exact) and zpool sent me an email that the device was now as well
marked as FAULTED (which is a great feature, btw). I let the process finish
and investigated these strange WRITE errors. DMESG told me about "Logical
block address out of range". So to be sure I used a different SATA-Cable
and different PSU-Powerplug for the now starting second resilvering.
2nd resilvering
I swapped the devices back and added the K80D device to my server. Now I
had five drives connected, the pool, however, still only consisted of the
four original drives. I didn't add the K80D device to the pool, but again
started a zpool replace only now without a missing P9FX drive but with a
FAULTED one still attached to the pool.
Having waited for that resilvering to complete, now my pool looks like
this.
zpool status datacenter gbdpve: Fri Jan 14 05:47:19 2022
pool: datacenter
state: DEGRADED
scan: resilvered 220G in 09:28:07 with 0 errors on Fri Jan 14 05:37:12 2022
config:
NAME STATE READ WRITE CKSUM
datacenter DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
ata-WDC_WD60EFAX-68JH4N1_WD-WX32DB08A24F ONLINE 0 0 0
replacing-1 UNAVAIL 196 0 157 insufficient replicas
ata-WDC_WD60EFRX-68L0BN1_WD-WXB1HB4LP9FX FAULTED 207 0 0 too many errors
ata-WDC_WD60EFAX-68JH4N1_WD-WX42D611K80D FAULTED 0 14 0 too many errors
ata-WDC_WD60EFAX-68SHWN0_WD-WX21D2905V9H ONLINE 0 0 0
ata-WDC_WD60EFAX-68SHWN0_WD-WX31D399DXY2 ONLINE 0 0 0
errors: No known data errors
There are READ errors on the P9FX device, which I deem okay, because
exactly this caused all the swapping in the first place. Well, but the
WRITE errors on the K80D device are a mystery to me.
DMESG looks like this:
/////// WRITE
[ 3207.298803] ata3.00: exception Emask 0x0 SAct 0x7fc08007 SErr 0x0 action 0x0
[ 3207.300977] ata3.00: irq_stat 0x40000008
[ 3207.303138] ata3.00: failed command: WRITE FPDMA QUEUED
[ 3207.305242] ata3.00: cmd 61/08:78:90:27:43/00:00:b9:00:00/40 tag 15 ncq dma 4096 out
res 41/10:00:90:27:43/00:00:b9:00:00/00 Emask 0x481 (invalid argument) <F>
[ 3207.309438] ata3.00: status: { DRDY ERR }
[ 3207.311505] ata3.00: error: { IDNF }
[ 3207.315432] ata3.00: configured for UDMA/133
[ 3207.317537] sd 2:0:0:0: [sdc] tag#15 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=14s
[ 3207.319644] sd 2:0:0:0: [sdc] tag#15 Sense Key : Illegal Request [current]
[ 3207.321737] sd 2:0:0:0: [sdc] tag#15 Add. Sense: Logical block address out of range
[ 3207.323818] sd 2:0:0:0: [sdc] tag#15 CDB: Write(16) 8a 00 00 00 00 00 b9 43 27 90 00 00 00 08 00 00
[ 3207.325928] blk_update_request: I/O error, dev sdc, sector 3108186000 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
[ 3207.328065] zio pool=datacenter vdev=/dev/disk/by-id/ata-WDC_WD60EFAX-68JH4N1_WD-WX42D611K80D-part1 error=5 type=2 offset=1591390183424 size=4096 flags=180880
///////////// \\\\\\\\\\\\\\\
//READ errors
[31577.441960] ata6.00: exception Emask 0x0 SAct 0x700000 SErr 0x0 action 0x0
[31577.444206] ata6.00: irq_stat 0x40000008
[31577.446399] ata6.00: failed command: READ FPDMA QUEUED
[31577.448592] ata6.00: cmd 60/58:b0:d0:c5:65/01:00:b3:00:00/40 tag 22 ncq dma 176128 in
res 41/40:00:d0:c5:65/00:00:b3:00:00/00 Emask 0x409 (media error) <F>
[31577.452983] ata6.00: status: { DRDY ERR }
[31577.455166] ata6.00: error: { UNC }
[31577.465882] ata6.00: configured for UDMA/133
[31577.468115] sd 5:0:0:0: [sdf] tag#22 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=4s
[31577.470319] sd 5:0:0:0: [sdf] tag#22 Sense Key : Medium Error [current]
[31577.472531] sd 5:0:0:0: [sdf] tag#22 Add. Sense: Unrecovered read error - auto reallocate failed
[31577.474725] sd 5:0:0:0: [sdf] tag#22 CDB: Read(16) 88 00 00 00 00 00 b3 65 c5 d0 00 00 01 58 00 00
[31577.476918] blk_update_request: I/O error, dev sdf, sector 3009791440 op 0x0:(READ) flags 0x700 phys_seg 7 prio class 0
[31577.479100] zio pool=datacenter vdev=/dev/disk/by-id/ata-WDC_WD60EFRX-68L0BN1_WD-WXB1HB4LP9FX-part1 error=5 type=1 offset=1541012168704 size=176128 flags=40080ca8
[31577.481312] ata6: EH complete
And here is smartctl -x of the two drives:
smartctl P9FX
smartctl -x /dev/disk/by-id/ata-WDC_WD60EFRX-68L0BN1_WD-WXB1HB4LP9FX
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.13.19-2-pve] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red
Device Model: WDC WD60EFRX-68L0BN1
Serial Number: WD-WXB1HB4LP9FX
LU WWN Device Id: 5 0014ee 20d34c1f5
Firmware Version: 82.00A82
User Capacity: 6,001,175,126,016 bytes [6.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5700 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Fri Jan 14 08:33:01 2022 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM feature is: Unavailable
Rd look-ahead is: Enabled
Write cache is: Enabled
DSN feature is: Unavailable
ATA Security is: Disabled, frozen [SEC2]
Wt Cache Reorder: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 5024) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 703) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x303d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-K 200 199 051 - 60
3 Spin_Up_Time POS--K 236 196 021 - 7166
4 Start_Stop_Count -O--CK 091 091 000 - 9297
5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0
7 Seek_Error_Rate -OSR-K 100 253 000 - 0
9 Power_On_Hours -O--CK 034 034 000 - 48812
10 Spin_Retry_Count -O--CK 100 100 000 - 0
11 Calibration_Retry_Count -O--CK 100 100 000 - 0
12 Power_Cycle_Count -O--CK 100 100 000 - 112
192 Power-Off_Retract_Count -O--CK 200 200 000 - 47
193 Load_Cycle_Count -O--CK 193 193 000 - 23112
194 Temperature_Celsius -O---K 126 110 000 - 26
196 Reallocated_Event_Count -O--CK 200 200 000 - 0
197 Current_Pending_Sector -O--CK 200 200 000 - 16
198 Offline_Uncorrectable ----CK 100 253 000 - 0
199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0
200 Multi_Zone_Error_Rate ---R-- 200 200 000 - 0
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
Address Access R/W Size Description
0x00 GPL,SL R/O 1 Log Directory
0x01 SL R/O 1 Summary SMART error log
0x02 SL R/O 5 Comprehensive SMART error log
0x03 GPL R/O 6 Ext. Comprehensive SMART error log
0x06 SL R/O 1 SMART self-test log
0x07 GPL R/O 1 Extended self-test log
0x09 SL R/W 1 Selective self-test log
0x10 GPL R/O 1 NCQ Command Error log
0x11 GPL R/O 1 SATA Phy Event Counters log
0x21 GPL R/O 1 Write stream error log
0x22 GPL R/O 1 Read stream error log
0x30 GPL,SL R/O 9 IDENTIFY DEVICE data log
0x80-0x9f GPL,SL R/W 16 Host vendor specific log
0xa0-0xa7 GPL,SL VS 16 Device vendor specific log
0xa8-0xb6 GPL,SL VS 1 Device vendor specific log
0xb7 GPL,SL VS 54 Device vendor specific log
0xbd GPL,SL VS 1 Device vendor specific log
0xc0 GPL,SL VS 1 Device vendor specific log
0xc1 GPL VS 93 Device vendor specific log
0xe0 GPL,SL R/W 1 SCT Command/Status
0xe1 GPL,SL R/W 1 SCT Data Transfer
SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
Device Error Count: 48 (device log contains only the most recent 24 errors)
CR = Command Register
FEATR = Features Register
COUNT = Count (was: Sector Count) Register
LBA_48 = Upper bytes of LBA High/Mid/Low Registers ] ATA-8
LH = LBA High (was: Cylinder High) Register ] LBA
LM = LBA Mid (was: Cylinder Low) Register ] Register
LL = LBA Low (was: Sector Number) Register ]
DV = Device (was: Device/Head) Register
DC = Device Control Register
ER = Error register
ST = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 48 [23] occurred at disk power-on lifetime: 48808 hours (2033 days + 16 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 00 b3 65 c5 d0 40 00 Error: UNC at LBA = 0xb365c5d0 = 3009791440
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 01 58 00 b0 00 00 b3 65 c5 d0 40 08 08:45:52.059 READ FPDMA QUEUED
60 00 10 00 a8 00 02 ba a0 ac 10 40 08 08:45:52.058 READ FPDMA QUEUED
60 00 10 00 a0 00 02 ba a0 ae 10 40 08 08:45:52.058 READ FPDMA QUEUED
ef 00 10 00 02 00 00 00 00 00 00 a0 08 08:45:52.040 SET FEATURES [Enable SATA feature]
27 00 00 00 00 00 00 00 00 00 00 e0 08 08:45:52.040 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
Error 47 [22] occurred at disk power-on lifetime: 48808 hours (2033 days + 16 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 00 b3 65 c0 08 40 00 Error: WP at LBA = 0xb365c008 = 3009789960
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
61 00 10 00 78 00 00 00 00 0a 10 40 08 08:45:45.010 WRITE FPDMA QUEUED
60 00 10 00 88 00 02 ba a0 ae 10 40 08 08:45:45.010 READ FPDMA QUEUED
60 00 10 00 80 00 02 ba a0 ac 10 40 08 08:45:45.010 READ FPDMA QUEUED
60 00 10 00 78 00 00 00 00 0a 10 40 08 08:45:45.010 READ FPDMA QUEUED
60 07 e0 00 70 00 00 b3 65 bd e8 40 08 08:45:45.010 READ FPDMA QUEUED
Error 46 [21] occurred at disk power-on lifetime: 48808 hours (2033 days + 16 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 00 b3 65 b6 40 40 00 Error: UNC at LBA = 0xb365b640 = 3009787456
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 07 e0 00 80 00 00 b3 65 b6 00 40 08 08:45:37.925 READ FPDMA QUEUED
61 00 10 00 78 00 00 00 00 0a 10 40 08 08:45:37.925 WRITE FPDMA QUEUED
ef 00 10 00 02 00 00 00 00 00 00 a0 08 08:45:37.906 SET FEATURES [Enable SATA feature]
27 00 00 00 00 00 00 00 00 00 00 e0 08 08:45:37.906 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
ec 00 00 00 00 00 00 00 00 00 00 a0 08 08:45:37.906 IDENTIFY DEVICE
Error 45 [20] occurred at disk power-on lifetime: 48808 hours (2033 days + 16 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 00 b3 65 ae 18 40 00 Error: WP at LBA = 0xb365ae18 = 3009785368
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
61 00 10 00 70 00 02 ba a0 ae 10 40 08 08:45:33.518 WRITE FPDMA QUEUED
61 00 10 00 68 00 02 ba a0 ac 10 40 08 08:45:33.518 WRITE FPDMA QUEUED
60 00 10 00 60 00 00 00 00 0a 10 40 08 08:45:33.518 READ FPDMA QUEUED
60 00 10 00 58 00 02 ba a0 ae 10 40 08 08:45:33.518 READ FPDMA QUEUED
60 00 10 00 50 00 02 ba a0 ac 10 40 08 08:45:33.518 READ FPDMA QUEUED
Error 44 [19] occurred at disk power-on lifetime: 48808 hours (2033 days + 16 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 00 b3 65 a7 90 40 00 Error: UNC at LBA = 0xb365a790 = 3009783696
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 07 e0 00 f0 00 00 b3 65 a6 30 40 08 08:45:26.459 READ FPDMA QUEUED
60 00 50 00 e8 00 00 b3 65 a5 d8 40 08 08:45:24.943 READ FPDMA QUEUED
60 00 e0 00 e0 00 02 ba a0 ae 20 40 08 08:45:24.910 READ FPDMA QUEUED
60 00 e0 00 d8 00 02 ba a0 ac 20 40 08 08:45:24.910 READ FPDMA QUEUED
60 00 e0 00 d0 00 00 00 00 0a 20 40 08 08:45:24.910 READ FPDMA QUEUED
Error 43 [18] occurred at disk power-on lifetime: 48808 hours (2033 days + 16 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 00 b3 65 a0 60 40 00 Error: WP at LBA = 0xb365a060 = 3009781856
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
61 00 10 00 70 00 00 00 00 0a 10 40 08 08:45:19.126 WRITE FPDMA QUEUED
61 00 50 00 20 00 00 b3 65 8f 80 40 08 08:45:19.125 WRITE FPDMA QUEUED
60 00 10 00 18 00 02 ba a0 ae 10 40 08 08:45:19.125 READ FPDMA QUEUED
60 00 10 00 10 00 02 ba a0 ac 10 40 08 08:45:19.125 READ FPDMA QUEUED
60 00 10 00 08 00 00 00 00 0a 10 40 08 08:45:19.125 READ FPDMA QUEUED
Error 42 [17] occurred at disk power-on lifetime: 48808 hours (2033 days + 16 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 00 b3 65 9c 50 40 00 Error: WP at LBA = 0xb3659c50 = 3009780816
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
61 00 10 00 c8 00 02 ba a0 ae 10 40 08 08:45:12.038 WRITE FPDMA QUEUED
61 00 10 00 c0 00 02 ba a0 ac 10 40 08 08:45:12.038 WRITE FPDMA QUEUED
61 00 10 00 b8 00 00 00 00 0a 10 40 08 08:45:12.038 WRITE FPDMA QUEUED
60 07 e0 00 b0 00 00 b3 65 98 70 40 08 08:45:12.038 READ FPDMA QUEUED
60 00 10 00 f8 00 02 ba a0 ac 10 40 08 08:45:11.996 READ FPDMA QUEUED
Error 41 [16] occurred at disk power-on lifetime: 48808 hours (2033 days + 16 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 00 b3 65 91 68 40 00 Error: UNC at LBA = 0xb3659168 = 3009778024
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 00 10 00 78 00 00 00 00 0a 10 40 08 08:45:04.964 READ FPDMA QUEUED
60 00 10 00 70 00 02 ba a0 ae 10 40 08 08:45:04.964 READ FPDMA QUEUED
60 00 10 00 68 00 02 ba a0 ac 10 40 08 08:45:04.964 READ FPDMA QUEUED
60 07 e0 00 60 00 00 b3 65 90 88 40 08 08:45:04.964 READ FPDMA QUEUED
ef 00 10 00 02 00 00 00 00 00 00 a0 08 08:45:04.947 SET FEATURES [Enable SATA feature]
SMART Extended Self-test Log Version: 1 (1 sectors)
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed: read failure 90% 48731 524409208
# 2 Conveyance offline Completed: read failure 90% 48730 524409208
# 3 Short offline Completed without error 00% 40357 -
# 4 Extended offline Completed without error 00% 21820 -
# 5 Extended offline Aborted by host 80% 21808 -
# 6 Conveyance offline Interrupted (host reset) 90% 21806 -
# 7 Short offline Completed without error 00% 21804 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
SCT Status Version: 3
SCT Version (vendor specific): 258 (0x0102)
Device State: Active (0)
Current Temperature: 26 Celsius
Power Cycle Min/Max Temperature: 26/32 Celsius
Lifetime Min/Max Temperature: 14/42 Celsius
Under/Over Temperature Limit Count: 0/0
Vendor specific:
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
SCT Temperature History Version: 2
Temperature Sampling Period: 1 minute
Temperature Logging Interval: 1 minute
Min/Max recommended Temperature: 0/60 Celsius
Min/Max Temperature Limit: -41/85 Celsius
Temperature History Size (Index): 478 (97)
Index Estimated Time Temperature Celsius
98 2022-01-14 00:36 31 ************
... ..( 30 skipped). .. ************
129 2022-01-14 01:07 31 ************
130 2022-01-14 01:08 32 *************
... ..( 40 skipped). .. *************
171 2022-01-14 01:49 32 *************
172 2022-01-14 01:50 31 ************
... ..( 5 skipped). .. ************
178 2022-01-14 01:56 31 ************
179 2022-01-14 01:57 30 ***********
... ..( 5 skipped). .. ***********
185 2022-01-14 02:03 30 ***********
186 2022-01-14 02:04 29 **********
... ..( 8 skipped). .. **********
195 2022-01-14 02:13 29 **********
196 2022-01-14 02:14 28 *********
... ..( 16 skipped). .. *********
213 2022-01-14 02:31 28 *********
214 2022-01-14 02:32 27 ********
... ..( 53 skipped). .. ********
268 2022-01-14 03:26 27 ********
269 2022-01-14 03:27 26 *******
... ..(115 skipped). .. *******
385 2022-01-14 05:23 26 *******
386 2022-01-14 05:24 31 ************
... ..( 17 skipped). .. ************
404 2022-01-14 05:42 31 ************
405 2022-01-14 05:43 32 *************
... ..(135 skipped). .. *************
63 2022-01-14 07:59 32 *************
64 2022-01-14 08:00 31 ************
... ..( 32 skipped). .. ************
97 2022-01-14 08:33 31 ************
SCT Error Recovery Control:
Read: 70 (7.0 seconds)
Write: 70 (7.0 seconds)
Device Statistics (GP/SMART Log 0x04) not supported
Pending Defects log (GP Log 0x0c) not supported
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x0001 2 0 Command failed due to ICRC error
0x0002 2 0 R_ERR response for data FIS
0x0003 2 0 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0005 2 0 R_ERR response for non-data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
0x0008 2 0 Device-to-host non-data FIS retries
0x0009 2 3 Transition from drive PhyRdy to drive PhyNRdy
0x000a 2 4 Device-to-host register FISes sent due to a COMRESET
0x000b 2 0 CRC errors within host-to-device FIS
0x000d 2 0 Non-CRC errors within host-to-device FIS
0x000f 2 0 R_ERR response for host-to-device data FIS, CRC
0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC
0x8000 4 44642 Vendor specific
smartcl K80D
smartctl -x /dev/disk/by-id/ata-WDC_WD60EFAX-68JH4N1_WD-WX42D611K80D
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.13.19-2-pve] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red (SMR)
Device Model: WDC WD60EFAX-68JH4N1
Serial Number: WD-WX42D611K80D
LU WWN Device Id: 5 0014ee 2beff94cc
Firmware Version: 83.00A83
User Capacity: 6,001,175,126,016 bytes [6.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Form Factor: 3.5 inches
TRIM Command: Available, deterministic, zeroed
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-3 T13/2161-D revision 5
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Fri Jan 14 08:36:19 2022 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM feature is: Unavailable
Rd look-ahead is: Enabled
Write cache is: Enabled
DSN feature is: Unavailable
ATA Security is: Disabled, frozen [SEC2]
Wt Cache Reorder: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (49680) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 117) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x3039) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-K 100 253 051 - 0
3 Spin_Up_Time POS--K 100 253 021 - 0
4 Start_Stop_Count -O--CK 100 100 000 - 3
5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0
7 Seek_Error_Rate -OSR-K 200 200 000 - 0
9 Power_On_Hours -O--CK 100 100 000 - 60
10 Spin_Retry_Count -O--CK 100 253 000 - 0
11 Calibration_Retry_Count -O--CK 100 253 000 - 0
12 Power_Cycle_Count -O--CK 100 100 000 - 3
192 Power-Off_Retract_Count -O--CK 200 200 000 - 0
193 Load_Cycle_Count -O--CK 200 200 000 - 5
194 Temperature_Celsius -O---K 119 116 000 - 31
196 Reallocated_Event_Count -O--CK 200 200 000 - 0
197 Current_Pending_Sector -O--CK 200 200 000 - 0
198 Offline_Uncorrectable ----CK 100 253 000 - 0
199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0
200 Multi_Zone_Error_Rate ---R-- 100 253 000 - 0
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
Address Access R/W Size Description
0x00 GPL,SL R/O 1 Log Directory
0x01 SL R/O 1 Summary SMART error log
0x02 SL R/O 5 Comprehensive SMART error log
0x03 GPL R/O 6 Ext. Comprehensive SMART error log
0x04 GPL R/O 256 Device Statistics log
0x04 SL R/O 8 Device Statistics log
0x06 SL R/O 1 SMART self-test log
0x07 GPL R/O 1 Extended self-test log
0x09 SL R/W 1 Selective self-test log
0x0c GPL R/O 2048 Pending Defects log
0x10 GPL R/O 1 NCQ Command Error log
0x11 GPL R/O 1 SATA Phy Event Counters log
0x24 GPL R/O 294 Current Device Internal Status Data log
0x30 GPL,SL R/O 9 IDENTIFY DEVICE data log
0x80-0x9f GPL,SL R/W 16 Host vendor specific log
0xa0-0xa7 GPL,SL VS 16 Device vendor specific log
0xa8-0xb6 GPL,SL VS 1 Device vendor specific log
0xb7 GPL,SL VS 78 Device vendor specific log
0xb9 GPL,SL VS 4 Device vendor specific log
0xbd GPL,SL VS 1 Device vendor specific log
0xc0 GPL,SL VS 1 Device vendor specific log
0xc1 GPL VS 93 Device vendor specific log
0xe0 GPL,SL R/W 1 SCT Command/Status
0xe1 GPL,SL R/W 1 SCT Data Transfer
SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
Device Error Count: 99 (device log contains only the most recent 24 errors)
CR = Command Register
FEATR = Features Register
COUNT = Count (was: Sector Count) Register
LBA_48 = Upper bytes of LBA High/Mid/Low Registers ] ATA-8
LH = LBA High (was: Cylinder High) Register ] LBA
LM = LBA Mid (was: Cylinder Low) Register ] Register
LL = LBA Low (was: Sector Number) Register ]
DV = Device (was: Device/Head) Register
DC = Device Control Register
ER = Error register
ST = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 99 [2] occurred at disk power-on lifetime: 48 hours (2 days + 0 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
10 -- 51 00 00 00 00 b9 43 27 90 40 00 Error: IDNF at LBA = 0xb9432790 = 3108186000
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 00 10 00 10 00 02 ba a0 ae 10 40 08 00:53:31.985 READ FPDMA QUEUED
60 00 10 00 08 00 02 ba a0 ac 10 40 08 00:53:31.985 READ FPDMA QUEUED
60 00 10 00 00 00 00 00 00 0a 10 40 08 00:53:31.985 READ FPDMA QUEUED
61 07 38 00 f0 00 01 5d 42 7d b8 40 08 00:53:31.985 WRITE FPDMA QUEUED
61 00 58 00 e8 00 01 5d 42 84 f0 40 08 00:53:31.985 WRITE FPDMA QUEUED
Error 98 [1] occurred at disk power-on lifetime: 48 hours (2 days + 0 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
10 -- 51 00 00 00 01 5d 42 85 48 40 00 Error: IDNF at LBA = 0x15d428548 = 5859607880
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
61 00 58 00 58 00 00 a2 82 6f 50 40 08 00:53:25.186 WRITE FPDMA QUEUED
61 00 08 00 a8 00 00 b9 43 27 a0 40 08 00:53:25.157 WRITE FPDMA QUEUED
61 00 08 00 50 00 00 b9 43 27 98 40 08 00:53:25.157 WRITE FPDMA QUEUED
61 00 08 00 a0 00 00 b9 43 27 90 40 08 00:53:25.157 WRITE FPDMA QUEUED
61 00 08 00 98 00 00 b9 43 27 88 40 08 00:53:25.157 WRITE FPDMA QUEUED
Error 97 [0] occurred at disk power-on lifetime: 48 hours (2 days + 0 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
10 -- 51 00 00 00 02 ba a0 ac 10 40 00 Error: IDNF at LBA = 0x2baa0ac10 = 11721026576
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
61 00 50 00 40 00 01 7c c1 fb e8 40 08 00:53:08.126 WRITE FPDMA QUEUED
61 00 10 00 38 00 00 00 00 0a 10 40 08 00:53:08.126 WRITE FPDMA QUEUED
61 00 10 00 10 00 02 ba a0 ae 10 40 08 00:53:08.126 WRITE FPDMA QUEUED
61 00 10 00 08 00 02 ba a0 ac 10 40 08 00:53:08.126 WRITE FPDMA QUEUED
61 00 08 00 30 00 00 b9 43 26 98 40 08 00:53:07.514 WRITE FPDMA QUEUED
Error 96 [23] occurred at disk power-on lifetime: 48 hours (2 days + 0 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
10 -- 51 00 00 00 00 b9 42 c1 38 40 00 Error: IDNF at LBA = 0xb942c138 = 3108159800
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
61 00 08 00 28 00 00 b9 43 26 98 40 08 00:52:59.660 WRITE FPDMA QUEUED
61 00 10 00 20 00 00 b9 42 c1 38 40 08 00:52:59.660 WRITE FPDMA QUEUED
61 00 08 00 18 00 00 b9 43 26 a0 40 08 00:52:59.660 WRITE FPDMA QUEUED
ef 00 10 00 02 00 00 00 00 00 00 a0 08 00:52:59.642 SET FEATURES [Enable SATA feature]
27 00 00 00 00 00 00 00 00 00 00 e0 08 00:52:59.642 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
Error 95 [22] occurred at disk power-on lifetime: 48 hours (2 days + 0 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
10 -- 51 00 00 00 00 b9 42 c0 f0 40 00 Error: IDNF at LBA = 0xb942c0f0 = 3108159728
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
61 00 08 00 f0 00 00 b9 43 26 a0 40 08 00:52:51.980 WRITE FPDMA QUEUED
61 00 10 00 e8 00 00 b9 42 c1 38 40 08 00:52:51.980 WRITE FPDMA QUEUED
61 00 08 00 e0 00 00 b9 43 26 98 40 08 00:52:51.980 WRITE FPDMA QUEUED
61 00 10 00 d8 00 00 b9 42 c0 f0 40 08 00:52:51.980 WRITE FPDMA QUEUED
61 00 50 00 d0 00 01 7c c1 fb 38 40 08 00:52:51.980 WRITE FPDMA QUEUED
Error 94 [21] occurred at disk power-on lifetime: 48 hours (2 days + 0 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
10 -- 51 00 00 00 00 a2 82 6d 98 40 00 Error: IDNF at LBA = 0xa2826d98 = 2726456728
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
61 00 50 00 60 00 01 7c c1 fb 38 40 08 00:52:44.189 WRITE FPDMA QUEUED
61 00 10 00 58 00 00 b9 42 c0 f0 40 08 00:52:44.189 WRITE FPDMA QUEUED
61 00 08 00 50 00 00 b9 43 26 98 40 08 00:52:44.189 WRITE FPDMA QUEUED
61 00 10 00 48 00 00 b9 42 c1 38 40 08 00:52:44.189 WRITE FPDMA QUEUED
61 00 08 00 40 00 00 b9 43 26 a0 40 08 00:52:44.189 WRITE FPDMA QUEUED
Error 93 [20] occurred at disk power-on lifetime: 48 hours (2 days + 0 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
10 -- 51 00 00 00 01 7c c1 fa 88 40 00 Error: IDNF at LBA = 0x17cc1fa88 = 6388054664
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
61 00 58 00 e0 00 00 a2 82 6d 98 40 08 00:52:38.124 WRITE FPDMA QUEUED
61 00 08 00 d8 00 00 b9 43 26 a0 40 08 00:52:38.036 WRITE FPDMA QUEUED
61 00 10 00 d0 00 00 b9 42 c1 38 40 08 00:52:38.036 WRITE FPDMA QUEUED
61 00 08 00 b8 00 00 b9 43 26 98 40 08 00:52:38.035 WRITE FPDMA QUEUED
61 00 10 00 b0 00 00 b9 42 c0 f0 40 08 00:52:38.035 WRITE FPDMA QUEUED
Error 92 [19] occurred at disk power-on lifetime: 48 hours (2 days + 0 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
10 -- 51 00 00 00 02 2a c1 93 00 40 00 Error: IDNF at LBA = 0x22ac19300 = 9307263744
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
61 00 50 00 90 00 02 2a c1 93 58 40 08 00:51:52.493 WRITE FPDMA QUEUED
61 00 58 00 38 00 00 a2 82 5b b8 40 08 00:51:52.480 WRITE FPDMA QUEUED
61 00 50 00 88 00 02 2a c1 93 00 40 08 00:51:52.411 WRITE FPDMA QUEUED
61 00 50 00 80 00 02 2a c1 92 a8 40 08 00:51:52.411 WRITE FPDMA QUEUED
ea 00 00 00 00 00 00 00 00 00 00 a0 08 00:51:51.184 FLUSH CACHE EXT
SMART Extended Self-test Log Version: 1 (1 sectors)
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
SCT Status Version: 3
SCT Version (vendor specific): 258 (0x0102)
Device State: Active (0)
Current Temperature: 31 Celsius
Power Cycle Min/Max Temperature: 28/33 Celsius
Lifetime Min/Max Temperature: 16/34 Celsius
Under/Over Temperature Limit Count: 0/0
Vendor specific:
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
SCT Temperature History Version: 2
Temperature Sampling Period: 1 minute
Temperature Logging Interval: 1 minute
Min/Max recommended Temperature: 0/65 Celsius
Min/Max Temperature Limit: -41/85 Celsius
Temperature History Size (Index): 478 (259)
Index Estimated Time Temperature Celsius
260 2022-01-14 00:39 32 *************
... ..( 42 skipped). .. *************
303 2022-01-14 01:22 32 *************
304 2022-01-14 01:23 33 **************
... ..( 89 skipped). .. **************
394 2022-01-14 02:53 33 **************
395 2022-01-14 02:54 32 *************
... ..( 8 skipped). .. *************
404 2022-01-14 03:03 32 *************
405 2022-01-14 03:04 33 **************
406 2022-01-14 03:05 33 **************
407 2022-01-14 03:06 32 *************
... ..(191 skipped). .. *************
121 2022-01-14 06:18 32 *************
122 2022-01-14 06:19 31 ************
123 2022-01-14 06:20 31 ************
124 2022-01-14 06:21 32 *************
125 2022-01-14 06:22 32 *************
126 2022-01-14 06:23 31 ************
... ..(132 skipped). .. ************
259 2022-01-14 08:36 31 ************
SCT Error Recovery Control:
Read: 70 (7.0 seconds)
Write: 70 (7.0 seconds)
Device Statistics (GP Log 0x04)
Page Offset Size Value Flags Description
0x01 ===== = = === == General Statistics (rev 1) ==
0x01 0x008 4 3 --- Lifetime Power-On Resets
0x01 0x010 4 60 --- Power-on Hours
0x01 0x018 6 3287100580 --- Logical Sectors Written
0x01 0x020 6 19888806 --- Number of Write Commands
0x01 0x028 6 132272 --- Logical Sectors Read
0x01 0x030 6 3425 --- Number of Read Commands
0x01 0x038 6 216000000 --- Date and Time TimeStamp
0x03 ===== = = === == Rotating Media Statistics (rev 1) ==
0x03 0x008 4 60 --- Spindle Motor Power-on Hours
0x03 0x010 4 59 --- Head Flying Hours
0x03 0x018 4 5 --- Head Load Events
0x03 0x020 4 0 --- Number of Reallocated Logical Sectors
0x03 0x028 4 0 --- Read Recovery Attempts
0x03 0x030 4 0 --- Number of Mechanical Start Failures
0x03 0x038 4 0 --- Number of Realloc. Candidate Logical Sectors
0x03 0x040 4 0 --- Number of High Priority Unload Events
0x04 ===== = = === == General Errors Statistics (rev 1) ==
0x04 0x008 4 99 --- Number of Reported Uncorrectable Errors
0x04 0x010 4 0 --- Resets Between Cmd Acceptance and Completion
0x05 ===== = = === == Temperature Statistics (rev 1) ==
0x05 0x008 1 31 --- Current Temperature
0x05 0x010 1 30 --- Average Short Term Temperature
0x05 0x018 1 - --- Average Long Term Temperature
0x05 0x020 1 33 --- Highest Temperature
0x05 0x028 1 22 --- Lowest Temperature
0x05 0x030 1 32 --- Highest Average Short Term Temperature
0x05 0x038 1 30 --- Lowest Average Short Term Temperature
0x05 0x040 1 - --- Highest Average Long Term Temperature
0x05 0x048 1 - --- Lowest Average Long Term Temperature
0x05 0x050 4 0 --- Time in Over-Temperature
0x05 0x058 1 65 --- Specified Maximum Operating Temperature
0x05 0x060 4 0 --- Time in Under-Temperature
0x05 0x068 1 0 --- Specified Minimum Operating Temperature
0x06 ===== = = === == Transport Statistics (rev 1) ==
0x06 0x008 4 12 --- Number of Hardware Resets
0x06 0x010 4 5 --- Number of ASR Events
0x06 0x018 4 0 --- Number of Interface CRC Errors
0xff ===== = = === == Vendor Specific Statistics (rev 1) ==
0xff 0x008 7 0 --- Vendor Specific
0xff 0x010 7 0 --- Vendor Specific
0xff 0x018 7 0 --- Vendor Specific
|||_ C monitored condition met
||__ D supports DSN
|___ N normalized value
Pending Defects log (GP Log 0x0c)
No Defects Logged
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x0001 2 0 Command failed due to ICRC error
0x0002 2 0 R_ERR response for data FIS
0x0003 2 0 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0005 2 0 R_ERR response for non-data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
0x0008 2 0 Device-to-host non-data FIS retries
0x0009 2 2 Transition from drive PhyRdy to drive PhyNRdy
0x000a 2 3 Device-to-host register FISes sent due to a COMRESET
0x000b 2 0 CRC errors within host-to-device FIS
0x000d 2 0 Non-CRC errors within host-to-device FIS
0x000f 2 0 R_ERR response for host-to-device data FIS, CRC
0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC
0x8000 4 44844 Vendor specific
What to do?
Now I'm kind of stuck. My pool functions, but is degraded and I cannot
resilver it because the new device faults as well. I'm considering running
a destructive badblocks on the new device. Another idea that crossed my
mind: Should I try connecting the K80D device via USB and see if it has to
do with SATA?
Could you give me a hint on how to proceed?
Thanks in advance! Your help is really appreciated.
Best regards.
—
Reply to this email directly, view it on GitHub
<#12973>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABXQ6HMSLWWRRSYHK6UMD23UV7M5RANCNFSM5L6ETE3Q>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
WD's SMR drives are, uh, "special". One day, I would like to know how they manufactured drives that appear to hang for long stretches very specifically when used with ZFS and not any other FS, but for now, I just know that you should run screaming from them. |
Beta Was this translation helpful? Give feedback.
-
Hello everyone and thank for your answers. Here's a quick update on my issue. I replaced the WD-SMR drive with a HGST-CMR drive and it resilvered without any issue. I'll avoid SMR-drives in the future. Best regards |
Beta Was this translation helpful? Give feedback.
WD's SMR drives are, uh, "special".
One day, I would like to know how they manufactured drives that appear to hang for long stretches very specifically when used with ZFS and not any other FS, but for now, I just know that you should run screaming from them.