-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Page Fault in zap_leaf_array_create #16730
Comments
First, please fill out the whole template when filing bugs. Saves me having to ask for things it already asks for, that is, a basic problem description and info about reproducing. More specific questions:
There's not much to go on here. Best I can tell from the opcode and register dumps, its trying to access a ZAP chunk well beyond the end of the leaf buffer. It's hard to see why though. Could be an already-corrupted ZAP (broken chunk links), could be more general memory corruption. I'll have more of a think about what information might be useful here. |
Sorry. I will make sure to provide as much info as possible next time.
This is was a production instance mainly running ai model training/testing and CI tasks. For CI tasks, lots of small files will result due to the nature of CI compilation tasks. Ai model training will generate larger files but < 20GB per train session.
The zpool reported dedup ratio was astronomically high, like 140x. Personally I don't understand how the dedup can be so high since logically the ci files (docker runs) are deleted after each run, and ai model files have very random variability so even not very compressible or have high dedup blocks.
No errors reported in logs. Did not run scrub. This was a 4 nvme strip raid zfs pool with no raidz protection.
No that I recall. I tried tip and 2.3.0rc2 and they all crash on load so I had to erase and rebuild the zpool.
There will 2-3 random crashes (computer rebooted by itself) preceding the zfs load error here. No power-loss event happened though the nvmes do not have power-loss preventive capacitors. Just 2-3 reboots that I also suspect zfs to be the cause. For now, I have rebuild the zpool and runing on 2.3.0rc2 release branch with dedup off to weed out dedup as culprit. |
Yeah, bit weird. I'd be interested to know what
I'm not sure what "zfs load" means here. Can you confirm whether this crash occurred during normal operations, or while importing the pool? Are there any crash logs from the "random crashes" + reboots? |
Typo,
I checked the dmesg and syslogs and there were no logs preceding the reboots. I do on a hourly basis flush the linux file cache buffer and have ran into previous zfs issues that affected lxd, at first I thought it was lxd, but lxd maintainer believes, based on all the previous data he has seen, which matches my report, that he strongly suspect zfs has some hidden memory safety problems post kernel 6.6 which is more easily triggered by doing a level 3 flush of the linux file buffer:
|
System information
Distribution Name | Ubuntu
Distribution Version | 24.04
Kernel Version | 6.6.59-x64v4-xanmod1
Architecture | x86_64
OpenZFS Version | 2.3.99
The text was updated successfully, but these errors were encountered: