-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
blockdev: query physical sector size, not logical #1056
Conversation
5cb4043
to
6c70704
Compare
When determining whether to install the 512b or 4k image, we should be comparing against the physical sector size of the drive, not its logical sector size. Some 4k drives report themselves as having a logical sector size of 512 bytes for backwards compatibility with OSes and software that can't do 4k blocks (these are also called "512e" drives). However, actually using the drive as a 512b one provides reduced performance because the drive needs to translate all requests. For comparison, `mkfs.xfs` will use a sector size of 4096 on 512e drives: https://git.kernel.org/pub/scm/fs/xfs/xfsprogs-dev.git/tree/mkfs/xfs_mkfs.c?id=3498b68025ce#n1932 One can also experimentally verify this: ``` $ qemu-img create -f qcow2 storage-512.qcow2 1G $ qemu-img create -f qcow2 storage-512e.qcow2 1G $ qemu-img create -f qcow2 storage-4kn.qcow2 1G $ cosa run -- \ -drive if=none,id=storage-512,format=qcow2,file=storage-512.qcow2 \ -device virtio-blk,drive=storage-512,physical_block_size=512,logical_block_size=512,serial=storage-512 \ -drive if=none,id=storage-512e,format=qcow2,file=storage-512e.qcow2 \ -device virtio-blk,drive=storage-512e,physical_block_size=4096,logical_block_size=512,serial=storage-512e \ -drive if=none,id=storage-4kn,format=qcow2,file=storage-4kn.qcow2 \ -device virtio-blk,drive=storage-4kn,physical_block_size=4096,logical_block_size=4096,serial=storage-4kn [core@cosa-devsh ~]$ sudo mkfs.xfs /dev/disk/by-id/virtio-storage-512 | grep sectsz = sectsz=512 attr=2, projid32bit=1 = sectsz=512 sunit=0 blks, lazy-count=1 [core@cosa-devsh ~]$ sudo mkfs.xfs /dev/disk/by-id/virtio-storage-512e | grep sectsz = sectsz=4096 attr=2, projid32bit=1 = sectsz=4096 sunit=1 blks, lazy-count=1 [core@cosa-devsh ~]$ sudo mkfs.xfs /dev/disk/by-id/virtio-storage-4kn | grep sectsz = sectsz=4096 attr=2, projid32bit=1 = sectsz=4096 sunit=1 blks, lazy-count=1 ``` To match this, we only need to change `get_sector_size_for_path` to use `BLKPBSZGET` instead of `BLKSSZGET`.
Nice work investigating this and writing a very detailed commit message here! |
6c70704
to
b889c59
Compare
Really good job! nice catch! |
OK, after learning more about GPT, I think this is incorrect actually. The GPT is located at LBA 1, which is based on the logical sector size of the disk, not physical. So even on a 512e disk, it needs to be at offset 512. This patch will make it so that we install the 4Kn image, where the GPT is located at offset 4096. So the kernel will be unable to read the partition table because it expects it at offset 512. So we're in this awkward situation where for optimal performance on 512e disks, we want the partition table of the 512b image, but the XFS filesystem of the 4k image. I guess we could create a third Anyway, the workaround short-term is also the same (manually reprovision the rootfs like this). |
I wonder if we need a separate tracker issue for this problem. We could still fix it the same way as coreos/fedora-coreos-tracker#1183, but a complete description of the problem with all surrounding context would be really nice for our future selves (and others). |
I think we can consider this issue as "fixed" also by coreos/fedora-coreos-config#2320. It would be too disruptive to always reprovision if we detect a 512e disk. I think a reasonable approach would be to reprovision only when the disk is over a certain size, as a proxy for "this is a long-running instance". The autosave-xfs work implements just that. |
Noting here that we bumped the "certain size" threshold in coreos/fedora-coreos-config#2565 |
When determining whether to install the 512b or 4k image, we should be comparing against the physical sector size of the drive, not its logical sector size. Some 4k drives report themselves as having a logical sector size of 512 bytes for backwards compatibility with OSes and software that can't do 4k blocks (these are also called "512e" drives). However, actually using the drive as a 512b one provides reduced performance because the drive needs to translate all requests.
For comparison,
mkfs.xfs
will use a sector size of 4096 on 512e drives:https://git.kernel.org/pub/scm/fs/xfs/xfsprogs-dev.git/tree/mkfs/xfs_mkfs.c?id=3498b68025ce#n1932
One can also experimentally verify this:
To match this, we only need to change
get_sector_size_for_path
to useBLKPBSZGET
instead ofBLKSSZGET
.